Database errors project

+31 votes
2.0k views

I will start a new project for errors in wikitree, where all things about finding logical errors in database would be handled.

Let me know if there is an interest in making such project.

For name I would use 'Errors', Paula already suggested 'tidy up', or maybe some other name. Please suggest.

BTW: Errors from new dump are recalculated, as you can see on freespace page. Also database statistic is updated.

You can also join the project here: http://www.wikitree.com/wiki/Project:Database_Errors

in WikiTree Tech by Aleš Trtnik G2G6 Pilot (808k points)
retagged by Maggie N.
This sounded crazy.  But when I looked it is very useful. I've already used it to correct a bit of nonsense.  Thanks!  :)
Ah, now I know how you came to correct the gender on a couple of my profiles. Very good!

Are you able to filter the list by watchlist so we can see the state of the profiles we manage?
Not at the moment. I don't think I can access your watch list without your password. Same goes for your private and protected profiles.

I have an idea, to make a bot to add/remove error categories to profiles with errors, but for now it is only the idea. I think bots are not very popular on WikiTree.

Maybe we will come up with some solution with administrators if there will be a lot of interest.

Regards Aleš

I would certainly participate in the project! As you may remember I already fixed a number of errors.

A problem I ran into was the substantial number of protected profiles, where I could do no more than post a comment requesting the profile manager to fix the error. To post such a comment or send an e-mail could be one of the functions of the bot you envisage.

I also think it would be nice if the error reports could be ordered by country, so project members could focus on fixing errors in their own country's profiles.

Project could be named Error fixing

Lists by country is not useful for USA, UK, since there are millions of pages there. Also a lot of locations are missing or not connected to a country. But now I am working on a solution Chris Hampson asked for. Although not by watchlist as he asked but by tree. You will get a list of errors for 10 generations away from you or any person.

I agree sorting by country is not very useful for USA and UK, but for countries with smaller amounts it could be. Being Dutch my first priority would be to fix Dutch profiles.

Promoted your initiative here.

On errors page I added link to get all errors of profiles, that are connected to you or any other profile. So you can first correct your relatives. Have fun.

Wow, this is fantastic! I can't wait to start using it to look through my tree!

To Jan, 

Good news. I resolved your wish. 

On errors page I added link to get all errors by location. Have fun.

Perfect Aleš, you are faster than lightning!

Question: the 203 error (Father is female) puzzles me. Reported are the details of a woman and a child of her. Looking at the woman's profile it turns out to be a woman married to a man. So where is the error? Looked at 10+ cases, all similar.

Example: Johanna Francina Schaap

Am I misinterpreting things or is it a bug?

Error 203 means that left person is defined as father of the right person.

So in this case error is with parents of Schaap-112.

Thanks for explaining. However:

Left person (contant-24) is female mother of schaap-112, schaap-112 has male father schaap-111.

So where is the error?

Sorry to point the obvious but Father is usually male and mother is female :-)

If you go into edit of Schaap-112, on right side you have section Edit Family where Father and Mother are defined. In this case you must exchange them.

In view mode first parent is always father and second is mother. If you hover with mouse over link, you will see hint father... on Johana link and mother... on Dirk link.

Oops, checked the genders of all involved, but not the relations...

Thanks again, and sorry to have bothered you with my stupidity!
If you have the list of profiles stored then it should be possible to do a user-initiated query joined on their watchlist?
I just tried the errors related to me. The form says I can check 20 generations, but it only checked two (Generation 0 and Generation 1).
Same here, just the two generations.
Jillaine and Nan,

You both have private parents, that are not part of public tree. Start with grandparents, as the Note says.
oops. sorry. user error. should have rtfm'ed....
Really awesome Aleš and a big help, will use this a lot I'm sure, so count me in :) Great project !
Ales, could you create a tool that finds profiles with empty fields?  There are way too many profiles that have a name and nothing else-- no birth date or place, no marriage date or place, no death date and place. These are near worthless profiles and would make a good volunteer project to work on.
I think there are too many such profiles to show them as an error. Have a look at statistics page http://www.wikitree.com/wiki/Space:Database_dump_statistics and you can see that 30 % of profiles have no birth location,... so it would create millions off errors.
What about if all four conditions were met?

1. Empty birth date AND

2. Empty birth location AND

3. Empty death date AND

4. Empty death location

Would that be possible?
I reviewed the data and I chose to first add errors for unconnected empty public (901) and open (902) profiles. Have a look.
THanks, Ales, I'll take a look.

Next question:

On wrong gender, I started fixing some. How is your "dump" updated? I reloaded your page for gender errors, and it still lists those profiles that I fixed. Thanks.
Unfortunately, it is updated once a month. On first of month it is updated, Then I need a day or to to import and recalculate data. So on Jun 3rd it will be updated.

I already asked Chris to export data on 10 day bases or weekly, but no reply yet.

That is the reason to write what you are checking on errors page, so others don't try to fix same errors.

Regards Aleš.
I have made about 15 corrections from your list so far.  So thank you.

Several of my profiles are marked as the wrong gender when in fact they are correct.  Sometimes women's names are written exactly like a man's name in Westfriesland, Noord-Holland.  Examples are Pieter instead of Pietertje; Claes instead of Claesje; Cornelis instead of Cornelisje.  What would you suggest I do so that others don't try to correct a gender on a profile that shouldn't be corrected?

As I wrote on error page Name type is set by sampling of the database. This are frequencies of mentioned names. 

Name total empty female male
Pieter 3312 61 6 3245
Pieterje 13 1 11 1
Claes 177 6 4 167
Claesje 20 1 19 0
Cornelis 2934 21 19 2894
Cornelisje 14 0 14 0

So this are really rare occasion. Pieter appears 6x and maybe some of them are even errors. Usually when correcting name or gender, user looks into biography, so there should be clearly written that woman's name is Pieter. 

Aleš, 

I don't know what you mean by "write what you are checking on errors page".  Do you mean this page:? 

http://www.wikitree.com/wiki/Space:Database_dump_statistics

Where on it shall we write things?

Know that I'm going through this one:

http://www.softdata.si/osebe_staro/ales/wikitree/Err_20160501/902_0000-0000.htm

And working on finding profiles to merge dead profiles with, or communicating with the profile managers (if there are any) requesting more details be added. 

Comment should be written as a comment on page with latest errors.

http://www.wikitree.com/wiki/Space:Database_Errors_Project/2016_05_01

Look at other comments.

To Bertram

I added possibility to ignore false errors. Look at errors page on the right side.

I am running into a variety of free space pages for the Error tool.  Perhaps these should be organized under one location.  Then just add the link to the new page on the indexing page?

http://www.wikitree.com/wiki/Space:Database_dump_statistics

http://www.wikitree.com/wiki/Space:Database_Errors_Project/2016_05_01

http://www.wikitree.com/wiki/Space:Database_dump_errors

http://www.wikitree.com/wiki/Project:Database_Errors

Others?

 

 

 

 

This were startup problems.

This is official startup point (for now) Join the project and you will be notified of all changes:

http://www.wikitree.com/wiki/Project:Database_Errors

On each monday new error report will be added accessible from project page:

http://www.wikitree.com/wiki/Space:Database_Errors_Project_2016-xx-xx

Database statistics is done on this page once a month:

http://www.wikitree.com/wiki/Space:Database_dump_statistics

Other pages are dead ends.

 

Thank you.  I've started marking the false errors on the errors page.

Could someone start explain what false error is ?!?!? Does that indicate an error in the algoritm.....

Any examples please...

The errors I have seen like father is is the same as profiles feels a bot could repaire... better than I ....  

Errors are calculated by some rules, that are true in most cases, but there are always some exceptions. In such case you can click False Error and this error will be removed from errors.

Examples:

  • Error 105 Duplicate siblings: Ries-613 & Ries-614 are Twins without a name so it is not an error.
  • Error 305 Mother too young: Moore-24889 & Smith-95728 was old enough to give birth, but due to privacy computer thinks she was 10. 
  • Errors 500: Genger correctness is calculated automatically, so there can be a lot of false errors.

As for BOTs I am against them. Bot could delete father is self, but genealogist can set correct one, which is much better. 

Exceptions might be error 631 and 632 and similar, where location could be set to empty text, but Chris could correct this with one command in a minute if he choses to do so.

8 Answers

+5 votes
 
Best answer

I think this tool is magic we need to start look at our data in different ways.......

Yesterday I started to traverse the list of people who had fathers of himself..... A big problem is that you can't edit a profile and have to leave a comment on the page that is easy to understand....

I leave a comment like this


This profile has been identified to have problem.  2016-05-11 303 Father of himself Please change it.

For more information please see 
Project:Database_Errors or ask a question at G2G

==> Wiki code

This profile has been identified to have problem [http://www.softdata.si/osebe_staro/ales/wikitree/Err_20160511/201_1800-1899.htm 303 Father of himself] - 2016 may 11

'''Please change it'''

For more information please see [http://www.wikitree.com/wiki/Project:Database_Errors Project:Database_Errors] or ask a question at [http://www.wikitree.com/g2g/tag/db_errors G2G]

:Regards
:Magnus Sälgö
:Stockholm, Sweden

by Living Sälgö G2G6 Pilot (297k points)
selected by Living Terink

Magnus,

You started work on error, that others already checked and corrected. Remained only errors on protected profiles, where there was no response from profile manager.

As for the standard message for profile manager I would suggest that someone writes appropriate standard message for each error with instruction on how to correct. I did write some basic description of each error on Project page but for message it should be extended.

For Link to error I would suggest 

http://www.sdms.si:92/function/WTWeb/errors.htm?Generations=1&WikiTreeID=Trtnik-2

Replace Trtnik-2 with persons WikitreeID

This would be better for user then seeing hundreds of errors.

 

Also official tag is changed from database_errors_project to db_errors as mentioned on project page.

Thanks 

URL looks great 





Maybe a Template {{dbcheck}}

==>
[http://www.sdms.si:92/function/WTWeb/errors.htm?Generations=1&WikiTreeID={{FULLPAGENAMEE}} Database error check]

Looks like it works if we have it on the profile but not as a comment....

Regarding Bots
If we check Wikipedia they use bots for everything 188M edits and I think we could use it for some purpose

  1. Renaming categories 
    today this is a nightmare 
  2. Repair of dead links and try to find a page on e.g. waybackmachine
  3. Setting categories for locations on profiles...
Use Generations=0 for errors only of that user..

I think comments don't support templates. You could put such template in profile.

For those things bot could be used, but someone needs to do it.
This template would be useful to put on personal page. Generations could be an optional parameter.

I would name template

{{db_myerrors}}

[http://www.sdms.si:92/function/WTWeb/errors.htm?WikiTreeID={{FULLPAGENAMEE}}&Generations=5 Database error check]

{{db_myerrors|Generations}}

[http://www.sdms.si:92/function/WTWeb/errors.htm?WikiTreeID={{FULLPAGENAMEE}}&Generations={{1}} Database error check]

I would also recommend template for any person with optional parameter. so you could link to any person

{{db_errors|WikiTreeID}}

[http://www.sdms.si:92/function/WTWeb/errors.htm?WikiTreeID={{1}}&Generations=5 Database error check for {{1}}]

{{db_errors|WikiTreeID|Generations}}

[http://www.sdms.si:92/function/WTWeb/errors.htm?WikiTreeID={{1}}&Generations={{2}} Database error check for {{1}}]

I am not exactly sure the syntax is correct. But you get the idea.

Can you or someone that can make this two templates?
If we get a GO from Chris I can do them tonight when I am home.

If we use a template its easy to Change it and complement with links to help pages, instruction videos how to correct an error...
I see the usage for this by user putting template on his own profile and just clicking the link occasionally it there is something to correct in his tree.
On the wish list a descendant parameter to check descendants to a profile...
I use generations. Parents, partner & children are all in 1 generation. Sibling are in 2nd generation (father's children).

Is your wish to check only descendants of the person? It could be done but I don't see why. You can correct errors in all related profiles, not just descendants.

Sorry now I understand its excellent as it is

Stockhaus-3

Magnus, you pointed me to such a message and now I too post a comment like that on error profiles.

Thanks!

Version 0.1 of {{db_errors}} is live see documentation

  1. {{db_errors}}
    ==> Generates a link that generates a report of current Wikiprofile 5 generations
     
  2. {{db_errors|10}}
    ==> Same as 1 but 10 generations 
     
  3. {{db_errors|10|Sälgö-2}}
    ==> Same as 2 but starts with Wikiprofile Sälgö-2
     
  4. {{db_errors|Generations=10|WikiTreeID=Sälgö-3}}


 

Very good. Thanks for that. I will add this to project page and news.
+5 votes
This is awesome!  I was already able to fix one, and sent off a couple of messages to active PMs.
by Nan Starjak G2G6 Pilot (383k points)
+6 votes
Added new errors. Check it out.
by Aleš Trtnik G2G6 Pilot (808k points)
Great work on this, Aleš! Bravo!
Born in the USA, not so much!  Wow, how enlighten this is.  The recent updates has found all my old imports with USA pre 1776.  Awesome!  Not only does it find errors but is shows us how much we have learned in such a short time. Loving the direction this is heading.
+3 votes
I am all in, thanks for starting a project.
by Esmé van der Westhuizen G2G6 Pilot (149k points)
+4 votes
Thanks for starting this! Particularly neat to be able to check my own tree: I found one instance where I had made a typo for a marriage date that made the couple having married in the century before they were born. And another instance where I had mixed up a couple of fathers in a long line of Anders Anderssons.

Have also been checking a bit on Sweden in general and found some stuff where I could make myself useful.
by Eva Ekeblad G2G6 Pilot (573k points)
I did that too..

It would be good to have a false error flag.   Eg balneavis-4, error 501

And if we could type a Wikitree-I'd at the end of the errors.htm line so we can put it in our navigation page.

Treble bazinga

Sorry, I missed this message.

I am just discussing with Chris on ways to integrate errors into WikiTree.

>> And if we could type a Wikitree-I'd at the end of the errors.htm line so we can put it in our navigation page.

It already works. Our server handles both GET and POST requests. 

Use following formats.

http://www.sdms.si:92/function/WTWeb/errors.htm?WikiTreeID=Trtnik-2

You can optionally add number of generations (10 is default)

http://www.sdms.si:92/function/WTWeb/errors.htm?WikiTreeID=Trtnik-2&Generations=5

And for location search use

http://www.sdms.si:92/function/WTWebLocation/errors.htm?Location=Slovenija

+2 votes

Aleš  looks like people misspell names to easier find duplicates ==> Aleš Trtnik will be added to Wikitree as Ales Trtnik i.e. grapheme s is just s

Maybe your magic software can better find duplicates than the WikiTree search engine...?

See long discussion if its ok add Salgo for surname Sälgö ;-)

-----------

Just out of curiosity is Aleš Trtnik a romanisation of a Cyrillic name?  ==> to be correct should you have just the cyrillic name of some people in your family tree or do you use both? 

by Living Sälgö G2G6 Pilot (297k points)
edited by Living Sälgö

The reason to misspell names is that people cant write Š if they are not from Slovenia. On english keyboard this letter doesn't exists. There are ways to get Š, but you must be familiar with computers. Today computers this letters very good but 10 years ago it wasn't so and all slovenian software is written so that it recognizes Ales and Aleš as same. I will have to check if we also handle Salgo and Sälgö the same.

I am from slovenia and we use Latin font, althov I can read Cyrillic, since in Yugoslavia we used both.

As for what to use I am not certain. My opinion is to use Latin, as most of the world can read that. Althov nowadays computers can read and compare cyrillic and latin so there is not much problems and in future it will be even better. But this doesn't work for chinese, arabic,... For that people from that part of the world should tell.

Regards Aleš

And the consequence is that is difficult to find duplicates .... would be great if you had a solution with the error project.... 

I think it's also lack of knowledge I have roots in Beograd and was down last year and did some genealogy still I am not 100% sure how to spell the names.... I also did some software consulting in Turkey and it took me 15 minutes to understand that they had a character i with two dots  Ï I with diaeresis I thought I had problem with my display ;-). 

World wide genealogy isn't easy...

I will put in in ToDo List of a project.
+2 votes

A new candidate garbage tags from Gedcom import.....

Yesterday I took away CONT from a profile.... CONT is continue and is used if gedcom records are to long.... and should not be part of a Wikitree profile

A search
site:www.wikitree.com/wiki CONT ==> gives 87 000 hits ;-)

Huston Huston we have a problem

by Living Sälgö G2G6 Pilot (297k points)
Biography is not part of database dump, so I can not make this an error. But Google can.
This goes back to the notes we made about the state of WikiTree's genealogies as open linked data and the failure of the biography section to be machine readable.

Some part of the WikiTree profile is machine readable 

King-17514 see  Google Structured Data tool

https://search.google.com/structured-data/testing-tool?url#url=http%3A%2F%2Fwww.wikitree.com%2Fwiki%2FKing-17514

@type     Person 
url     http://www.wikitree.com/wiki/King-17514
name     Alvin Cecil King
givenName    Alvin
additionalName    Cecil
familyName    King
birthDate    1913-Aug-01
gender    male
@type    Event 
location    
@type     Placename    Sonoma, California, United States

parent     @type    Person
url    http://www.wikitree.com/wiki/King-17529
name    James King

parent    @type    Person
url    http://www.wikitree.com/wiki/Willis-4871
name    Mabel A. (Willis) King

sibling    @type    Person
url    http://www.wikitree.com/wiki/King-17528
name    Mabel Ruth (King) Goatley

sibling    @type    Person
url    http://www.wikitree.com/wiki/King-18030
name    Alice King

sibling    @type    Person
url    http://www.wikitree.com/wiki/King-18029
name    John Willis King

sibling    @type    Person
url    http://www.wikitree.com/wiki/King-18027
name    Howard Caldwell King

sibling    
@type    Person
url    http://www.wikitree.com/wiki/King-18026
name    Nellie King
.....

+2 votes
Thanks so much for all the hard, hard work on this tool. I'm curious, though, about the criteria for the rules.

I had an error of "father too old", but the father would have been 40, not really an unusual age all in my (genealogical and personal) experience. No big deal checking the profile, but I'm not going to change anything, so I imagine it will continue to come up in the error report. Another error was "unknown gender of spouse", but the spouse's profile did have the gender specified.

As I said, curious about the criteria.

And thanks again!!
by Ellen Curnes G2G6 Mach 8 (84.7k points)
I see now what the "father too old" error may be. I thought the error related to the father of the profiled person, whereas now I think it is referring to a child of the profiled person - a child whose profile I haven't been involved with. I assume the error will also show up on the error report of the profile manager of the child's profile.
It will, You can also write him a comment to correct the error.

If the errors seems correct, check the changes tab for recent edits. Someone could already corrected the error. If that is the case it will disappear on monday, when errors are recalculated.

If is allways good to include WikitreeID of the person involved. It is easyer to answer.
Thanks for you response. Although I was asking in general, the specific profiles were Grimmett-148 (father is too old) and Arnett-162 (unknown gender of spouse).

Grimmett-148 is the father of Sara born 1927 so he would be 128 years old. The limit I set is 115 years and will lower to reasonable age as this errors are corrected.

Arnett-162 You corrected the error. It will disappear on Monday. If you are in a hurry, you can click Hide for 30 days link, and it will disappear. If the error was not corrected, it will reappear in a month. This is usually used if you have to communicate with profile manager to correct the data.

Related questions

+19 votes
0 answers
271 views asked Jun 20, 2017 in The Tree House by Aleš Trtnik G2G6 Pilot (808k points)
+11 votes
1 answer
238 views asked Jun 12, 2017 in The Tree House by Aleš Trtnik G2G6 Pilot (808k points)
+13 votes
2 answers
255 views asked Jun 5, 2017 in The Tree House by Aleš Trtnik G2G6 Pilot (808k points)
+19 votes
2 answers
374 views asked May 29, 2017 in The Tree House by Aleš Trtnik G2G6 Pilot (808k points)
+15 votes
2 answers
566 views asked May 22, 2017 in The Tree House by Aleš Trtnik G2G6 Pilot (808k points)
+20 votes
1 answer
517 views asked May 15, 2017 in The Tree House by Aleš Trtnik G2G6 Pilot (808k points)
+23 votes
4 answers
582 views asked May 8, 2017 in The Tree House by Aleš Trtnik G2G6 Pilot (808k points)
+19 votes
1 answer
251 views asked May 2, 2017 in The Tree House by Aleš Trtnik G2G6 Pilot (808k points)
+22 votes
5 answers
689 views asked Apr 26, 2017 in The Tree House by Aleš Trtnik G2G6 Pilot (808k points)
+17 votes
1 answer
443 views asked Apr 17, 2017 in The Tree House by Aleš Trtnik G2G6 Pilot (808k points)

WikiTree  ~  About  ~  Help Help  ~  Search Person Search  ~  Surname:

disclaimer - terms - copyright

...