Database errors project

+30 votes
1.2k views

I will start a new project for errors in wikitree, where all things about finding logical errors in database would be handled.

Let me know if there is an interest in making such project.

For name I would use 'Errors', Paula already suggested 'tidy up', or maybe some other name. Please suggest.

BTW: Errors from new dump are recalculated, as you can see on freespace page. Also database statistic is updated.

You can also join the project here: http://www.wikitree.com/wiki/Project:Database_Errors

in WikiTree Tech by Aleš Trtnik G2G6 Pilot (561k points)
retagged by Maggie N.
I have made about 15 corrections from your list so far.  So thank you.

Several of my profiles are marked as the wrong gender when in fact they are correct.  Sometimes women's names are written exactly like a man's name in Westfriesland, Noord-Holland.  Examples are Pieter instead of Pietertje; Claes instead of Claesje; Cornelis instead of Cornelisje.  What would you suggest I do so that others don't try to correct a gender on a profile that shouldn't be corrected?

As I wrote on error page Name type is set by sampling of the database. This are frequencies of mentioned names. 

Name total empty female male
Pieter 3312 61 6 3245
Pieterje 13 1 11 1
Claes 177 6 4 167
Claesje 20 1 19 0
Cornelis 2934 21 19 2894
Cornelisje 14 0 14 0

So this are really rare occasion. Pieter appears 6x and maybe some of them are even errors. Usually when correcting name or gender, user looks into biography, so there should be clearly written that woman's name is Pieter. 

Aleš, 

I don't know what you mean by "write what you are checking on errors page".  Do you mean this page:? 

http://www.wikitree.com/wiki/Space:Database_dump_statistics

Where on it shall we write things?

Know that I'm going through this one:

http://www.softdata.si/osebe_staro/ales/wikitree/Err_20160501/902_0000-0000.htm

And working on finding profiles to merge dead profiles with, or communicating with the profile managers (if there are any) requesting more details be added. 

Comment should be written as a comment on page with latest errors.

http://www.wikitree.com/wiki/Space:Database_Errors_Project/2016_05_01

Look at other comments.

To Bertram

I added possibility to ignore false errors. Look at errors page on the right side.

I am running into a variety of free space pages for the Error tool.  Perhaps these should be organized under one location.  Then just add the link to the new page on the indexing page?

http://www.wikitree.com/wiki/Space:Database_dump_statistics

http://www.wikitree.com/wiki/Space:Database_Errors_Project/2016_05_01

http://www.wikitree.com/wiki/Space:Database_dump_errors

http://www.wikitree.com/wiki/Project:Database_Errors

Others?

 

 

 

 

This were startup problems.

This is official startup point (for now) Join the project and you will be notified of all changes:

http://www.wikitree.com/wiki/Project:Database_Errors

On each monday new error report will be added accessible from project page:

http://www.wikitree.com/wiki/Space:Database_Errors_Project_2016-xx-xx

Database statistics is done on this page once a month:

http://www.wikitree.com/wiki/Space:Database_dump_statistics

Other pages are dead ends.

 

Thank you.  I've started marking the false errors on the errors page.

Could someone start explain what false error is ?!?!? Does that indicate an error in the algoritm.....

Any examples please...

The errors I have seen like father is is the same as profiles feels a bot could repaire... better than I ....  

Errors are calculated by some rules, that are true in most cases, but there are always some exceptions. In such case you can click False Error and this error will be removed from errors.

Examples:

  • Error 105 Duplicate siblings: Ries-613 & Ries-614 are Twins without a name so it is not an error.
  • Error 305 Mother too young: Moore-24889 & Smith-95728 was old enough to give birth, but due to privacy computer thinks she was 10. 
  • Errors 500: Genger correctness is calculated automatically, so there can be a lot of false errors.

As for BOTs I am against them. Bot could delete father is self, but genealogist can set correct one, which is much better. 

Exceptions might be error 631 and 632 and similar, where location could be set to empty text, but Chris could correct this with one command in a minute if he choses to do so.

8 Answers

+5 votes
 
Best answer

I think this tool is magic we need to start look at our data in different ways.......

Yesterday I started to traverse the list of people who had fathers of himself..... A big problem is that you can't edit a profile and have to leave a comment on the page that is easy to understand....

I leave a comment like this


This profile has been identified to have problem.  2016-05-11 303 Father of himself Please change it.

For more information please see 
Project:Database_Errors or ask a question at G2G

==> Wiki code

This profile has been identified to have problem [http://www.softdata.si/osebe_staro/ales/wikitree/Err_20160511/201_1800-1899.htm 303 Father of himself] - 2016 may 11

'''Please change it'''

For more information please see [http://www.wikitree.com/wiki/Project:Database_Errors Project:Database_Errors] or ask a question at [http://www.wikitree.com/g2g/tag/db_errors G2G]

:Regards
:Magnus Sälgö
:Stockholm, Sweden

by C S G2G6 Pilot (274k points)
selected by Jan Terink

Magnus,

You started work on error, that others already checked and corrected. Remained only errors on protected profiles, where there was no response from profile manager.

As for the standard message for profile manager I would suggest that someone writes appropriate standard message for each error with instruction on how to correct. I did write some basic description of each error on Project page but for message it should be extended.

For Link to error I would suggest 

http://www.sdms.si:92/function/WTWeb/errors.htm?Generations=1&WikiTreeID=Trtnik-2

Replace Trtnik-2 with persons WikitreeID

This would be better for user then seeing hundreds of errors.

 

Also official tag is changed from database_errors_project to db_errors as mentioned on project page.

Thanks 

URL looks great 





Maybe a Template {{dbcheck}}

==>
[http://www.sdms.si:92/function/WTWeb/errors.htm?Generations=1&WikiTreeID={{FULLPAGENAMEE}} Database error check]

Looks like it works if we have it on the profile but not as a comment....

Regarding Bots
If we check Wikipedia they use bots for everything 188M edits and I think we could use it for some purpose

  1. Renaming categories 
    today this is a nightmare 
  2. Repair of dead links and try to find a page on e.g. waybackmachine
  3. Setting categories for locations on profiles...
Use Generations=0 for errors only of that user..

I think comments don't support templates. You could put such template in profile.

For those things bot could be used, but someone needs to do it.
This template would be useful to put on personal page. Generations could be an optional parameter.

I would name template

{{db_myerrors}}

[http://www.sdms.si:92/function/WTWeb/errors.htm?WikiTreeID={{FULLPAGENAMEE}}&Generations=5 Database error check]

{{db_myerrors|Generations}}

[http://www.sdms.si:92/function/WTWeb/errors.htm?WikiTreeID={{FULLPAGENAMEE}}&Generations={{1}} Database error check]

I would also recommend template for any person with optional parameter. so you could link to any person

{{db_errors|WikiTreeID}}

[http://www.sdms.si:92/function/WTWeb/errors.htm?WikiTreeID={{1}}&Generations=5 Database error check for {{1}}]

{{db_errors|WikiTreeID|Generations}}

[http://www.sdms.si:92/function/WTWeb/errors.htm?WikiTreeID={{1}}&Generations={{2}} Database error check for {{1}}]

I am not exactly sure the syntax is correct. But you get the idea.

Can you or someone that can make this two templates?
If we get a GO from Chris I can do them tonight when I am home.

If we use a template its easy to Change it and complement with links to help pages, instruction videos how to correct an error...
I see the usage for this by user putting template on his own profile and just clicking the link occasionally it there is something to correct in his tree.
On the wish list a descendant parameter to check descendants to a profile...
I use generations. Parents, partner & children are all in 1 generation. Sibling are in 2nd generation (father's children).

Is your wish to check only descendants of the person? It could be done but I don't see why. You can correct errors in all related profiles, not just descendants.

Sorry now I understand its excellent as it is

Stockhaus-3

Magnus, you pointed me to such a message and now I too post a comment like that on error profiles.

Thanks!

Version 0.1 of {{db_errors}} is live see documentation

  1. {{db_errors}}
    ==> Generates a link that generates a report of current Wikiprofile 5 generations
     
  2. {{db_errors|10}}
    ==> Same as 1 but 10 generations 
     
  3. {{db_errors|10|Sälgö-2}}
    ==> Same as 2 but starts with Wikiprofile Sälgö-2
     
  4. {{db_errors|Generations=10|WikiTreeID=Sälgö-3}}


 

Very good. Thanks for that. I will add this to project page and news.
+5 votes
This is awesome!  I was already able to fix one, and sent off a couple of messages to active PMs.
by Nan Starjak G2G6 Pilot (269k points)
+6 votes
Added new errors. Check it out.
by Aleš Trtnik G2G6 Pilot (561k points)
Great work on this, Aleš! Bravo!
Born in the USA, not so much!  Wow, how enlighten this is.  The recent updates has found all my old imports with USA pre 1776.  Awesome!  Not only does it find errors but is shows us how much we have learned in such a short time. Loving the direction this is heading.
+3 votes
I am all in, thanks for starting a project.
by Esmé van der Westhuizen G2G6 Pilot (125k points)
+4 votes
Thanks for starting this! Particularly neat to be able to check my own tree: I found one instance where I had made a typo for a marriage date that made the couple having married in the century before they were born. And another instance where I had mixed up a couple of fathers in a long line of Anders Anderssons.

Have also been checking a bit on Sweden in general and found some stuff where I could make myself useful.
by Eva Ekeblad G2G6 Pilot (428k points)
I did that too..

It would be good to have a false error flag.   Eg balneavis-4, error 501

And if we could type a Wikitree-I'd at the end of the errors.htm line so we can put it in our navigation page.

Treble bazinga

Sorry, I missed this message.

I am just discussing with Chris on ways to integrate errors into WikiTree.

>> And if we could type a Wikitree-I'd at the end of the errors.htm line so we can put it in our navigation page.

It already works. Our server handles both GET and POST requests. 

Use following formats.

http://www.sdms.si:92/function/WTWeb/errors.htm?WikiTreeID=Trtnik-2

You can optionally add number of generations (10 is default)

http://www.sdms.si:92/function/WTWeb/errors.htm?WikiTreeID=Trtnik-2&Generations=5

And for location search use

http://www.sdms.si:92/function/WTWebLocation/errors.htm?Location=Slovenija

+2 votes

Aleš  looks like people misspell names to easier find duplicates ==> Aleš Trtnik will be added to Wikitree as Ales Trtnik i.e. grapheme s is just s

Maybe your magic software can better find duplicates than the WikiTree search engine...?

See long discussion if its ok add Salgo for surname Sälgö ;-)

-----------

Just out of curiosity is Aleš Trtnik a romanisation of a Cyrillic name?  ==> to be correct should you have just the cyrillic name of some people in your family tree or do you use both? 

by C S G2G6 Pilot (274k points)
edited by C S

The reason to misspell names is that people cant write Š if they are not from Slovenia. On english keyboard this letter doesn't exists. There are ways to get Š, but you must be familiar with computers. Today computers this letters very good but 10 years ago it wasn't so and all slovenian software is written so that it recognizes Ales and Aleš as same. I will have to check if we also handle Salgo and Sälgö the same.

I am from slovenia and we use Latin font, althov I can read Cyrillic, since in Yugoslavia we used both.

As for what to use I am not certain. My opinion is to use Latin, as most of the world can read that. Althov nowadays computers can read and compare cyrillic and latin so there is not much problems and in future it will be even better. But this doesn't work for chinese, arabic,... For that people from that part of the world should tell.

Regards Aleš

And the consequence is that is difficult to find duplicates .... would be great if you had a solution with the error project.... 

I think it's also lack of knowledge I have roots in Beograd and was down last year and did some genealogy still I am not 100% sure how to spell the names.... I also did some software consulting in Turkey and it took me 15 minutes to understand that they had a character i with two dots  Ï I with diaeresis I thought I had problem with my display ;-). 

World wide genealogy isn't easy...

I will put in in ToDo List of a project.
+2 votes

A new candidate garbage tags from Gedcom import.....

Yesterday I took away CONT from a profile.... CONT is continue and is used if gedcom records are to long.... and should not be part of a Wikitree profile

A search
site:www.wikitree.com/wiki CONT ==> gives 87 000 hits ;-)

Huston Huston we have a problem

by C S G2G6 Pilot (274k points)
Biography is not part of database dump, so I can not make this an error. But Google can.
This goes back to the notes we made about the state of WikiTree's genealogies as open linked data and the failure of the biography section to be machine readable.

Some part of the WikiTree profile is machine readable 

King-17514 see  Google Structured Data tool

https://search.google.com/structured-data/testing-tool?url#url=http%3A%2F%2Fwww.wikitree.com%2Fwiki%2FKing-17514

@type     Person 
url     http://www.wikitree.com/wiki/King-17514
name     Alvin Cecil King
givenName    Alvin
additionalName    Cecil
familyName    King
birthDate    1913-Aug-01
gender    male
@type    Event 
location    
@type     Placename    Sonoma, California, United States

parent     @type    Person
url    http://www.wikitree.com/wiki/King-17529
name    James King

parent    @type    Person
url    http://www.wikitree.com/wiki/Willis-4871
name    Mabel A. (Willis) King

sibling    @type    Person
url    http://www.wikitree.com/wiki/King-17528
name    Mabel Ruth (King) Goatley

sibling    @type    Person
url    http://www.wikitree.com/wiki/King-18030
name    Alice King

sibling    @type    Person
url    http://www.wikitree.com/wiki/King-18029
name    John Willis King

sibling    @type    Person
url    http://www.wikitree.com/wiki/King-18027
name    Howard Caldwell King

sibling    
@type    Person
url    http://www.wikitree.com/wiki/King-18026
name    Nellie King
.....

+2 votes
Thanks so much for all the hard, hard work on this tool. I'm curious, though, about the criteria for the rules.

I had an error of "father too old", but the father would have been 40, not really an unusual age all in my (genealogical and personal) experience. No big deal checking the profile, but I'm not going to change anything, so I imagine it will continue to come up in the error report. Another error was "unknown gender of spouse", but the spouse's profile did have the gender specified.

As I said, curious about the criteria.

And thanks again!!
by Ellen Curnes G2G6 Mach 7 (70.2k points)
I see now what the "father too old" error may be. I thought the error related to the father of the profiled person, whereas now I think it is referring to a child of the profiled person - a child whose profile I haven't been involved with. I assume the error will also show up on the error report of the profile manager of the child's profile.
It will, You can also write him a comment to correct the error.

If the errors seems correct, check the changes tab for recent edits. Someone could already corrected the error. If that is the case it will disappear on monday, when errors are recalculated.

If is allways good to include WikitreeID of the person involved. It is easyer to answer.
Thanks for you response. Although I was asking in general, the specific profiles were Grimmett-148 (father is too old) and Arnett-162 (unknown gender of spouse).

Grimmett-148 is the father of Sara born 1927 so he would be 128 years old. The limit I set is 115 years and will lower to reasonable age as this errors are corrected.

Arnett-162 You corrected the error. It will disappear on Monday. If you are in a hurry, you can click Hide for 30 days link, and it will disappear. If the error was not corrected, it will reappear in a month. This is usually used if you have to communicate with profile manager to correct the data.

Related questions

+17 votes
0 answers
180 views asked Jun 20, 2017 in The Tree House by Aleš Trtnik G2G6 Pilot (561k points)
+9 votes
1 answer
165 views asked Jun 12, 2017 in The Tree House by Aleš Trtnik G2G6 Pilot (561k points)
+11 votes
2 answers
186 views asked Jun 5, 2017 in The Tree House by Aleš Trtnik G2G6 Pilot (561k points)
+17 votes
2 answers
194 views asked May 29, 2017 in The Tree House by Aleš Trtnik G2G6 Pilot (561k points)
+13 votes
2 answers
353 views asked May 22, 2017 in The Tree House by Aleš Trtnik G2G6 Pilot (561k points)
+18 votes
1 answer
304 views asked May 15, 2017 in The Tree House by Aleš Trtnik G2G6 Pilot (561k points)
+21 votes
4 answers
315 views asked May 8, 2017 in The Tree House by Aleš Trtnik G2G6 Pilot (561k points)
+17 votes
1 answer
207 views asked May 2, 2017 in The Tree House by Aleš Trtnik G2G6 Pilot (561k points)
+20 votes
5 answers
342 views asked Apr 26, 2017 in The Tree House by Aleš Trtnik G2G6 Pilot (561k points)
+15 votes
1 answer
280 views asked Apr 17, 2017 in The Tree House by Aleš Trtnik G2G6 Pilot (561k points)

WikiTree  ~  About  ~  Help Help  ~  Search Person Search  ~  Surname:

disclaimer - terms - copyright

...