News on Database errors project (11 May 2016)

+36 votes
502 views

Analysis was done on data from May 11th 2016.

In 10 days 9591 errors were corrected by my estimation. Great job.

Have fun correcting errors.

You can also join the project here: http://www.wikitree.com/wiki/Project:Database_Errors

   1.5.  11.5. Delta Delta%
101 Birth in future 343 312 31 9,04%
102 Death in future 370 343 27 7,30%
103 Death brfore birth 13139 13110 29 0,22%
104 Too old 7021 7036 -15 -0,21%
105 Duplicate sibling 4711 3892 819 17,38%
106 Duplicates between bigtree and unconnected 3253 3293 -40 -1,23%
201 Father is self 251 240 11 4,38%
202 Parents are same 224 221 3 1,34%
203 Father is Female 6167 6244 -77 -1,25%
204 Father has no Gender 2159 1689 470 21,77%
205 Father is too young or not born 48551 48867 -316 -0,65%
206 Father is too old 6952 6955 -3 -0,04%
207 Father is also a child 510 502 8 1,57%
208 Father is also a spouse 241 234 7 2,90%
209 Father is also a sibling 3527 3512 15 0,43%
210 Father was dead before birth 32482 32559 -77 -0,24%
301 Mother is self 10 6 4 40,00%
303 Mother is Male 8321 7931 390 4,69%
304 Mother has no Gender 2101 1856 245 11,66%
305 Mother too young or not born 65178 65596 -418 -0,64%
306 Mother is too old 5822 5817 5 0,09%
307 Mother is also a child 35 34 1 2,86%
308 Mother is also a spouse 1566 1578 -12 -0,77%
309 Mother is also a sibling 373 364 9 2,41%
310 Mother was dead before birth 31202 31224 -22 -0,07%
401 Spouse is self 4 3 1 25,00%
402 Unknown gender of spouse 2990 2538 452 15,12%
403 Single sex marriage 4671 4001 670 14,34%
404 Marrige before birth 10937 10704 233 2,13%
405 Married too old 2857 2871 -14 -0,49%
406 Marrige after death 12580 12602 -22 -0,17%
407 Death too old after Marriage 2027 1818 209 10,31%
501 Wrong male gender 7130 7012 118 1,65%
502 Missing male gender 53397 53276 121 0,23%
503 Probably wrong male gender 8380 8349 31 0,37%
504 Probably missing male gender 56357 56486 -129 -0,23%
505 Wrong female gender 9072 8717 355 3,91%
506 Missing female gender 51058 51119 -61 -0,12%
507 Probably wrong female gender 7027 6946 81 1,15%
508 Probably missing female gender 37889 37983 -94 -0,25%
509 Missing gender 97415 97714 -299 -0,31%
510 Unique name without gender 24792 24854 -62 -0,25%
601 Unknown birth location 9291 9343 -52 -0,56%
603 USA to early in birth location 217129 217281 -152 -0,07%
631 Unknown death location 16230 16454 -224 -1,38%
632 Y death location 6542 6534 8 0,12%
633 USA to early in death location 80738 80672 66 0,08%
661 Unknown marriage location 1328 1350 -22 -1,66%
662 Y marriage location 6 6 0 0,00%
663 USA to early in marriage location 26103 26113 -10 -0,04%
901 Unconected empty public profiles 35473 35433 40 0,11%
902 Unconected empty open profiles 17242 17221 21 0,12%
Total 1043174 1040815 2359 0,23%
in The Tree House by Aleš Trtnik G2G6 Pilot (549k points)
retagged by Maggie N.
This is fantastic, Aleš. Just amazing.

It's terrific how easy you're making it for other members to help correct errors.

For those who haven't clicked over to the project page yet: it's set up so that you browse different error types, then click over to view the people with that error, sorted by when they lived. For example, people from 1500-1699 whose birth/death dates would make them too old: http://www.softdata.si/osebe_staro/ales/wikitree/Err_20160511/104_1500-1699.htm

Yesterday I was talking to a couple team members about how accuracy is something that sets WikiTree apart. We're not just creating a free single family tree. We're dedicated to making it accurate.

If we're not already the most accurate large single family tree, this unique project will make us so.

Chris
I did notice that some of the errors that are being generated are due to missing dates. In some cases, those dates are lost in the wheel of time, and may not be something that can be corrected. Do we still want to carry them as actual "errors" in these cases? Just pointing out that sometimes what appears to be an error, isn't always one.

Which errors are you referring to? 

Missing date as an error is only used with 901 and 902 errors. And these profiles are orphans without any data except name as described on project page http://www.wikitree.com/wiki/Project:Database_Errors#901_unconnected_empty_public_profiles.2C_902_unconnected_empty_open_profiles. I think Jillaine's goal was to merge or delete or recycle such profiles, since they will most likely never be connected to any other person or Global Tree. These are profiles like UNKNOWN-113875 and UNKNOWN-11389 and similar. Sally Unknown will never be connected to any profile.

Oh - I'm reading the report wrong! My fault!

I was reading into it that when it said "mother too young" - I would go to the first profile, look at the mother - and jumped to the conclusion that the reference was incorrect.

Reviewing it now, I can see that the "mother" is the one next to the error, and the "child" is to the right. Sometimes you just gotta open your eyes and actually read these things...
It was a little unclear who is who, so I added relation function in report. Just reload the page.

Does the 10 day interval mean dumps are provided more frequently, every ten days instead of every month?

As I wrote on Space:Database_Errors_Project_2016-05-11 updates will be on mondays. Today is exception since there were some corrections in export.

News

Weekly update

Great news: Chris just notified me that dump has weekly schedule on Sunday night (US central time). So during monday errors will be updated.

 

I echo Chris's comments, Aleš.  What you have created is a fantastic addition to WikiTree!  We are lucky to have you as a member, and I am proud to be part of such a wonderful and dedicated group of genealogists.
Great news indeed, Aleš!

Now we can see the results of our efforts much, much earlier.

Agree with all it really is amazing and great news we now can see the result of our efforts to correct things in just one week ! So thanks for making this possible Chris ! And tech team and everyone that is working on it of course ;) 

But especially a thank you for you Aleš it's a great project and we really needed something like this ! 

I love this. Yep; I'm focused on 902 in particular. I can't really do anything with 901 because I can't edit public (green) profiles.

And it's a good project for me because it's discrete, I can work on it in little chunks of time.

And... I really really dislike empty profiles.

Oh, Ales, they're not all orphaned. I'd say 50% of them have a profile manager; of those 50% are "active" (in last six months), so it's also turning into a way to encourage active folks to fill out their empty profiles.

Glad to hear about the weekly updates.
WOO!

The new updates are fantastic! I'm excited to report that I'm clear of errors (from my Grandfather) for 4 generations! Next stop - Generation 5 - with 8 errors!

Thanks!
Just cleared all errors, 10 generations from myself.  Not many, and they were silly ones like missed off gender.

I false error reported. Ellis Cattell (Cattell-44). System was not happy that I had the gender as female. Christening record and Marriage record in biography (sourced) prove the gender correct.  Got a message back stating the error will be removed next run.

The system works. Well done.
There is no limit in generations. If 10 generations are correct, you can move to 15, 20,...

One thing I'm noticing while working through 902 of Aleš Trtnik's database errors project: Many of the orphaned profiles (and even not-orphaned profiles) that have empty dates SEEM to be of living people. 

Do we have a template for identifying/flagging living people?

4 Answers

+6 votes
If someone is experiencing very slow page load (over 30s) and navigation on huge error lists, let me know, so I will split them in smaller chunks.
by Aleš Trtnik G2G6 Pilot (549k points)
+4 votes
=== False errors ===

System to identify false errors is done.

If you encounter an error, that isn't an error, you can click a link on the right to tell the system that it is not an error. Error will disappear at latest on next recalculation (on monday).
by Aleš Trtnik G2G6 Pilot (549k points)
This database errors system is a great addition for getting the job done right. I have used it quite a bit.  And now you add a False Error reporting bit. Great Job. With this additional tool, your system will get even better.

Excellent.!!
That's great! I've only found one case so far where the reported error was actually correct (a Duplicate Siblings case where the profile manager confirmed that it was actually a pair of unnamed twins who died at birth), but it'll be good to flag that for future reference.
Thanks for the false error improvement.
THANK YOU - This is one of the best additions to the list of possible errors that I have found as it allows us to control data correctness - especially where NAMES are concerned.  I come from a group that used lots of unique, surname-based often, fore-names for its ancestors and so have a LOT of so-called "errors" that in fact are correct.
I'm going to chime in on Chet's answer with a Thanks you and me too! I had cleared all the errors in my first five generations before this error was added. Suddenly I had something like fifteen profiles with unique name errors. I never realized our names were that unusual LOL
+3 votes

Added new error 

512 Separators in first name

These are names, that contains separators. They shouldn't be used

by Aleš Trtnik G2G6 Pilot (549k points)
+1 vote
Hi Alecs - Can I respectfully request that you add a new "Error" category that will detect PRIVATE (Red) or even better, ALL NON-PUBLIC (i.e., all that are not "Open" or Black privacy level) profiles where there is a Birth Year EARLIER THAN 1817 or one or other Parent's Birth Year is EARLIER THAN 1775 ??  One of my "pet peeves" in working in older lineages is that there exists a collection of "private" or "non-public" profiles which are 200+ years old (by birth year) and sometimes they have NO active profile manager.  The PRIVATE (Red) ones cannot even be accessed by anyone but Team members and so even we Leaders cannot go in and correct them.   Please consider this.  Best wishes,

 

Chet Snow-2128 A WikiTree Leader
by Chet Snow G2G6 Mach 5 (52.0k points)

So if I understand it correctly, persons born more than 200 years ago, can only be public (50) or open (60)? Isn't that set automatically? So all before 1810s can be marked as errors. Also if Death or Marriage date is before 1810s. I can also check parents dates.

Here is summary of nonpublic persons for privacy / birth date.

http://www.softdata.si/osebe_staro/ales/wikitree/Privacy birth.htm

 

Hi Alecs

It IS a rule on WikiTree that all profiles for people whom we can assume were born before 200 years ago (i.e., before 1817) must be OPEN (not "Public" - that is for dead people born 1817 upwards.  It should be automatic but sometimes members go back and change the status to Public (Green) or even Red (Private - cannot be seen).  Sometimes there is a typo in the status somewhere - one person was marked as "still alive" instead of a death year BUT she was born in 1690....  My suggestion is to use your error-finding tool to sort out these mistakes - if you think it will work.  I am NOT a "techie" internet savvy guy.  I hope what I say makes sense - otherwise reply and ask questions and I will try to reply so we comprehend each other.  The link you sent was in Slovenian - sorry I do not speak that!!
In report, there are only 2 slovenian words. I translated them.

As you can see in report, it is cca 1000 profiles too protected, or have wrong birth date. It is not a problem to identify them as an error. I will also check Death date. Marriage data isn't present for private and protected profiles in database dump so I can not check that.

So also public profiles before 1816 must be open.

I can also check the parents birth dates before 1820.
Hi Alecs

Thank you again for your patience with me.  I agree there are few words - I am NOT good with statistics; very sorry - it was a "half joke" in what I said.

Yes, ALL profiles where the birth year is 1816 or earlier MUST BE OPEN (not "Public - Green circle" but Open "black unlocked").  So those thousand should be identified and hopefully eliminated - many may be birth year Typos....

For the parents, the birth year should be BEFORE 1775 to allow a 40 year span for a mother to have had a child born no later than 1815.  Few women had children after 40 years old in those times (maybe 1% - we can ignore them).

Do not worry about marriage data - it's not so important.

There will always be some profiles that "slip through the cracks" in any statistical program - with 5 million profiles and more adding to them daily.  BUT if we can catch even 500 of the 1000 with this tool, it is worth it in my opinion.

Thank you again, sincerely, for your hard work on this.  I have now eliminated all the found errors in my first 6 generations (although your programs keeps finding new categories - but that is GOOD)  and I am working on the rest to 10 generations.

QUESTION:  What about the Geographic words in Parentheses that I asked about ??  example  (USA)  or (Canada)  or even (Germany) ??  Can they be excluded from the "Error" category so people can use them to indicate places that will BECOME THAT NATION LATER IN TIME ???   This would be very helpful so anyone looking at old profiles gets the sense of where they took place - few people nowadays know "Silesia" or "Bohemia" but many more know (Germany) or (Czech) etc. etc.  

Again, thank you for this dialogue.

Chet Snow - A WikiTree Leader
I did post in thread location-errors

http://www.wikitree.com/g2g/249287/location-errors

I also loose threads in this G2G. Participating in 5 threads and forgetting where I wrote what.

Related questions

+21 votes
5 answers
+31 votes
7 answers
+15 votes
0 answers
+19 votes
3 answers
+22 votes
3 answers
+19 votes
2 answers
+22 votes
4 answers
+21 votes
3 answers
+32 votes
9 answers

WikiTree  ~  About  ~  Help Help  ~  Search Person Search  ~  Surname:

disclaimer - terms - copyright

...