Should WikiTree have a style guide for AI generated content?

Question

Should WikiTree have a style guide for AI generated content?

928 views

Lots of discussions are being had in the community about 1) identifying AI content and 2) citing the content as AI generated. Here is how the question was formulated by Brad Foley in Canada Project discussions.

We should develop styles and policies regarding the use of AI in generating content (especially biographies) before their abuse becomes an issue.

In particular, tools like Bing and ChatGPT can easily generate lovely bios with lots of false "facts". In one sense this is covered by existing WikiTree citation guidelines. But the scale and volume that these biographies can be generated poses a new challenge.

For instance, in a biography of 30 lines, with 45 different statements, what constitutes sufficient and adequate sourcing? A single footnote meets current standards, but is clearly not enough to support such a long biography, which might be loaded with hallucinated (made-up) "facts". Current guidelines are not much help.

In those cases, should we ask (or require):

* a "biography generated by ChatGPT/Bing" citation

* the profile manager to delete the text

or are current tools up for the job like

* tons of {{needs citations}} tags

* an {{insuffcient sources}} sticker

Do we also need to develop new documentation around AI generated content as standalone docs, or as additions to existing docs.

Some of these issues have been raised in a few other posts.

https://www.wikitree.com/g2g/1544689/artificial-intelligence-questions-answers-citation-sources

https://www.wikitree.com/g2g/1531185/anyone-tried-chatgpt3

Go!

Edited to add content and links.

asked Nov 22, 2023 in Policy and Style by Mags Gaulden G2G6 Pilot (644k points)
edited Nov 22, 2023 by Mags Gaulden

13 Answers

Related questions

+8 votes

2 answers

119 views

Discrepancies in records. Could it be AI?

asked Apr 21 in The Tree House by Julee Limbach G2G2 (2.6k points)

+5 votes

0 answers

87 views

AI and Genealogy Webinar 5-11-2023

asked Nov 22, 2023 in Appreciation by Phil Phillips G2G6 Mach 1 (15.4k points)

+7 votes

2 answers

289 views

The use of AI Technology in Genealogy

asked Jul 26, 2023 in WikiTree Tech by Bryan Simmonds G2G6 Mach 1 (16.2k points)

+7 votes

5 answers

384 views

Is AI the future of genealogy?

asked Jul 6, 2023 in The Tree House by Paul Schmehl G2G6 Pilot (149k points)

+6 votes

2 answers

167 views

Have you seen “Genealogy and AI”?

asked Jul 5, 2023 in The Tree House by Peter Roberts G2G6 Pilot (708k points)

+5 votes

3 answers

197 views

Anyone have any genealogical success with any AI browser plug-in?

asked Jun 21, 2023 in The Tree House by Ian Holland G2G1 (1.0k points)

+4 votes

1 answer

251 views

AI revolutionalizes Software Development

asked Nov 16, 2022 in The Tree House by Judy Bramlage G2G6 Pilot (214k points)

+5 votes

1 answer

309 views

May we please add WikiTree to DuckDuckGo’s “bangs”?

asked May 4, 2023 in WikiTree Tech by Peter Roberts G2G6 Pilot (708k points)

+43 votes

20 answers

2.3k views

Proposal: Update the Help:Sources Style Guide

asked Feb 19, 2020 in Policy and Style by Joe Cochoit G2G6 Pilot (260k points)

+24 votes

8 answers

1.2k views

Do we need some updates to the Biography Style guide?

asked Feb 3, 2018 in Policy and Style by Robin Lee G2G6 Pilot (866k points)

Answer 1 · 2023-11-22T22:47:43+0000

I've already seen a user create multiple space pages, purporting to provide timelines and facts, with a single "Microsoft AI results" citation. They honestly looked pretty good, but that in itself worries me.

In principle it's not that much different than citing Wikipedia, but in the case of Wikipedia we're relying on a previous layer of human mediated editing and fact checking. Here, we know that AI oftentimes (up to 20% or more) makes up facts, ie "hallucinates". I'm honestly worried that it's going to be increasingly difficult to wade through a large volume of AI generated text in bios and space pages.

I'm enthusiastic about the promise of AI as an aid, but I'm also worried that we're going to need to have legions of volunteers to go through new content, and check, line by line, what is real and what isn't, and to add sources.

It'll be like the early days of gedcom dumps all over again.

Answer 2 · 2023-11-23T00:19:20+0000

The use of AI whether ChatGPT or other similar software to create biographies or sources bothers me.

In many ways it seems like it could be useful, but as someone who isn't comfortable with, or experienced with many types of software, I don't know how to determine if any AI generated content is correct or if it could be correct. What are the sources/

I have seen examples of AI generated biographies or information on other websites and much of it seems generated based on examples from non verified sources.

As an example with a made up name: Mary Jane___ was a loving mother, she always looked after her children well, she was well known as an accomplished needle woman etc, etc.

It just seems like the sort of information that might be put in a obituary by someone who knew little if anything about the deceased and was just trying to fill the approx 10 lines required for an obituary.

As a further example; an x times GGA of my husband, became a widow with 3 young children at about 26 years old. She remarried within 12 months of her husband's death.

The previous PM, who has not been active in many years had suggested that there was an extra-marital relationship with her second husband prior to her 1st husband's death.

There seemed to be no understanding by the previous PM that a widow with 3 young children in a frontier community either remarried or was left with no support and was unlikely to be able to manage on her own.

If we leave the decision about what is included about the person to AI, how do know that AI is not looking at completely unsupported information about the person such as the 'gossip" shown above.

We would need the same verifiable source for AI enhanced profiles that we currently required.

If AI does not have valid sources it is not any more reliable than an unsourced Ancestry tree that only has other unsourced Ancestry trees as a source.

Answer 3 · 2023-11-23T00:29:19+0000

A "biography generated by ChatGPT/Bing" statement (it can't be said to be a citation) should be required, and no, it is not nearly enough. I do think current tools are up for the job, a big, bright {{Unsourced}} being the main one (or an {{insuffcient sources}} sticker, assuming those AI generated profiles have at least one actual source in order to have been created), along with a {{needs citations}} tag for every statement. Those will quickly expose the "emperor's new clothes" for what they are -- pretty words for a lot of nothing.

Answer 4 · 2023-11-23T01:03:36+0000

Perhaps we could institute a == Generated by AI == section, much like the newly popular and useful == Research Notes == section. It must be made clear that text in the AI section needs to be verified and sourced to be taken seriously.

Answer 5 · 2023-11-23T01:06:39+0000

I have been using ChatGPT to combine bios in complicated merges, or if there are several cut and pasted excerpts. It works pretty well, but I provide the content, and I still consider it a draft. It's pretty easy to add citations afterward and check the statements (ChatGPT can't help adding lofty summary statements about people's achievements no matter what prompts I use to ask it to stick to facts and avoid opinions). Having ChatGPT generate the content, including the research is frightening, but I suppose unavoidable.

I think ChatGPT bios should be considered unsourced - So yes we should have a style guide for ChatGPT.

Answer 6 · 2023-11-23T04:02:43+0000

Just to add to the conversation...

In thinking about this, if one has a list of sources, then the biography practically writes itself. It's easy enough to create a chronological biography with citations for each fact.

I would have a concern that people would focus on putting together a biography without sources already at hand. Having an AI model do that for you is disastrous.

Until there is a high level of accuracy of having an AI model write a genealogical biography, with the ability to properly cite sources (given a list), then I'm ok with just outright banning them.

Biographies are not that hard to write, and the act of writing them helps to work through genealogical issues, like what is missing data, or lack of proof of relationships, etc. AI can't do that yet. It's folly to rely on an incomplete, and inaccurate, tool that is more of a toy at this stage.

answered Nov 23, 2023 by Eric Weddington G2G6 Pilot (521k points)

I definitely agree with everything you said.

The question is whether people who haven't done a lot of research understand that. Or if there are other people who just want to take shortcuts. I suspect that lots of people are going to be tempted to throw some unsourced ancestry tree or whatever (or even a bunch of sources) into an AI blender and post it as a biography.

Or they might stumble on such a biography second hand and take it as gospel. If a website has ai content that says "John Bobblebonk is my great uncle. I remember the smell of his pipe and the sound of his laugh. When he was 2 he fell and broke his front tooth ...." It sounds real. But....

I agree with other posters that, in principle, this is no different than the situation we're currently in, regarding sources. But the volume and diversity of spam content might plausibly be overwhelming.

commented Nov 23, 2023 by Brad Foley G2G6 Mach 7 (79.2k points)

Answer 7 · 2023-11-24T15:30:26+0000

I share the concern, but this is not just an AI thing. We currently have "tools" that take the information attached to a profile and creates a "lovely" biography. I recently came across one that had 15 children listed in the biography and 15 children attached. When I asked for the source from the "long time member" that had used a tool to create the biography, I was told that the tool just uses the data, no sources are confirmed. End of story, only 6 of those 15 are confirmed children.

Unless we make the people who use such tools responsible for validating the information in the biography, it really will become an issue.

Answer 8 · 2023-11-26T09:54:09+0000

I don't think it will be possible or necessary to write a special policy for AI. The situation is changing and as others have pointed out the basic principles are not that much different than situations we know from the past.

People should always say where they got their material.

It certainly wouldn't be right to take an anti AI approach out of principle because AI can be used in so many ways. One thing I expect we'll be seeing more is for example translations written by AI. I guess these won't always be cited, and that is not really the end of the world, but it would be best practice.

What we are seeing with some of the new AI goes a bit further than mere translation. For example you can give a list of facts and ask for it to be written up in a certain style. That is something which is going to take some getting used to, and I'm sure it will bring complications.

Answer 9 · 2023-11-26T12:43:10+0000

I think that "AI generated content" is a very diffuse concept. There is in a fact a continous scale, all from the simplest machine generated content up to the current state-ot-the-art generative AIs. I for one use machine-generated content for my bios all the time, with a self-developed Perl script that takes data from my own database and produces a full biography. I'm improving it all the time, and the output needs ever less hand editing. Given enough time for development, the output of such a script could eventually reach a level that might be called "Ai generated content".

But AI generated content in itself, as I see it, is not a problem. The real problem is the old "garbage in, garbage out" (GIGO) principle. It is really exactly the same problem that we've got already witrh the old machine-generated GEDCOM junk, with reams of sections and subsections which as a rule boils down to absolutely nothing of substance.

If "AI generated content" should be disallowed, it would probably make sense also to disallow GEDCOM imports. And maybe all "machine generated content", such as my own scripted biographies.

I think the real issue is the "fluff" factor, ie. the ratio of text over what might be called substance. Or in plain old information theory speak, the signal-to-noise ratio, which might actually be made into an operational definition of what is wanted in the Biography field of a profile.

As long as the generated text is supported by sources, everything should be OK. But bio text unsupported by sources should never be welcomed, whether it is generated by humans or computers.

Answer 10 · 2024-01-09T00:49:28+0000

My personal opinion, as someone that uses ChatGPT on a regular basis, the debate touches on a crucial aspect of integrating AI into historical and genealogical work. While AI can be a powerful tool, its current limitations, especially in generating factually accurate content without explicit sourcing, present significant challenges. The idea of a style guide or specific policies for AI-generated content seems prudent. It could help in setting clear standards for the use of AI, ensuring that any content it generates is properly vetted and sourced. This approach would maintain the integrity of the historical record while still leveraging the benefits of new technology. The balance between innovation and accuracy is delicate, especially in fields where factual correctness is paramount.

Categories

Should WikiTree have a style guide for AI generated content?

Please log in or register to add a comment.

Please log in or register to answer this question.

13 Answers

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Related questions