News on Database errors project (22 May 2016)

+22 votes
620 views

Analysis was done on data from May 22nd 2016.

In 7 days 9988 errors were corrected by my estimation. Great job.

Have fun correcting errors.

You can also join the project here: http://www.wikitree.com/wiki/Project:Database_Errors

  15.5. Projected 22.5. Reduction Delta%
101 Birth in future 308 310 253 57 18,26%
102 Death in future 331 333 313 20 5,91%
103 Death before birth 13080 13145 13022 123 0,94%
104 Too old 7001 7036 6968 68 0,96%
105 Duplicate sibling 3607 3625 3026 599 16,52%
106 Duplicates between bigtree and unconnected 3245 3261 2959 302 9,27%
107 Full name in UPPERCASE 3153 3169 3136 33 1,03%
108 Full name in lowercase 3207 3223 3193 30 0,93%
109 Profile should be open (birth date) 11667 11725 11439 286 2,44%
110 Profile should be open (death date) 1516 1524 1512 12 0,76%
201 Father is self 121 122 114 8 6,26%
202 Parents are same 193 194 98 96 49,48%
203 Father is Female 6257 6289 6253 36 0,56%
204 Father has no Gender 1175 1181 1026 155 13,12%
205 Father is too young or not born 48694 48939 48607 332 0,68%
206 Father is too old 6928 6963 6789 174 2,50%
207 Father is also a child 393 395 378 17 4,30%
208 Father is also a spouse 232 233 216 17 7,36%
209 Father is also a sibling 3236 3252 3078 174 5,36%
210 Father was dead before birth 32505 32669 32506 163 0,50%
301 Mother is self 5 5 5 0 0,50%
303 Mother is Male 7880 7919 7798 121 1,53%
304 Mother has no Gender 1715 1724 1540 184 10,65%
305 Mother too young or not born 65535 65861 65559 302 0,46%
306 Mother is too old 5783 5812 5716 96 1,65%
307 Mother is also a child 13 13 11 2 15,80%
308 Mother is also a spouse 1516 1524 1322 202 13,23%
309 Mother is also a sibling 362 364 356 8 2,14%
310 Mother was dead before birth 31153 31308 31067 241 0,77%
401 Spouse is self 3 3 3 0 0,52%
402 Unknown gender of spouse 2386 2398 2068 330 13,78%
403 Single sex marriage 3998 4019 3502 517 12,86%
404 Marriage before birth 10634 10689 10526 163 1,53%
405 Married too old 2847 2862 2812 50 1,74%
406 Marriage after death 12556 12621 12506 115 0,91%
407 Death too old after Marriage 1769 1778 1659 119 6,70%
408 Multiple marriages on same day 10234 10287 10198 89 0,87%
409 Marriage to duplicate person 31870 32036 31772 264 0,82%
501 Wrong male gender 9064 9109 8813 296 3,25%
502 Missing male gender 75402 75777 74800 977 1,29%
503 Probably wrong male gender 6237 6268 6147 121 1,93%
504 Probably missing male gender 36129 36309 35644 665 1,83%
505 Wrong female gender 10479 10531 10295 236 2,24%
506 Missing female gender 60682 60984 60375 609 1,00%
507 Probably wrong female gender 5389 5416 5260 156 2,88%
508 Probably missing female gender 29462 29609 29382 227 0,77%
509 Missing gender 95747 96224 96390 -166 -0,17%
510 Unique name without gender 23654 23772 23454 318 1,34%
511 Unique name (spelling) 346225 347949 283116 0 0,00%
512 Separators in first name 68716 69058 68383 675 0,98%
601 Unknown birth location 9366 9424 9392 32 0,34%
602 Y birth location 3 3 0 3 100,00%
603 USA to early in birth location 216951 218303 218609 -306 -0,14%
604 Birth location too short 16791 16896 13986 0 0,00%
631 Unknown death location 16505 16608 16509 99 0,60%
632 Y death location 6736 6778 6355 423 6,24%
633 USA to early in death location 80529 81031 81307 -276 -0,34%
634 Death location too short 18242 18356 17071 0 0,00%
661 Unknown marriage location 1346 1354 1348 6 0,47%
662 Y marriage location 7 7 0 7 100,00%
663 USA to early in marriage location 26039 26201 26190 11 0,04%
664 Marriage location too short 3246 3266 3020 0 0,00%
901 Unconected empty public profiles 35418 35594 35346 248 0,70%
902 Unconnected empty open profiles 17172 17258 17136 122 0,70%
Total 1552645 1560895,845 1481634 9988 0,64%
in The Tree House by Aleš Trtnik G2G6 Pilot (549k points)
retagged by Maggie N.

Stupid question Projected what is that?

Many thanks for your more than magic work. Feels like you add a new dimension of working together to Wikitree. I feel focus on quality is so important for Wikitree. 

Question is how we could get better tracking of Unsourced profiles....
Today we have just the category Unsourced 164 141 members it would be nice to get that dimension visible in the error report.... ==> you could find all unsourced profiles 10 generations away... Or do a search on location and see unsourced profiles...

Projected is increased number of errors due to data growth. What would be better expression. This was already asked.

Well I could use this API https://www.mediawiki.org/wiki/API:Categorymembers but looks like wiki API is not setup, so this won't work. I wouldn't download it page by page. Do you know how to get list of all those pages?

Thank you Alexš, Don't know if you've got room for another column, but it would be nice to see the first column be "since project began". I like seeing progress in the last week, but it would also be interesting to see progress against the original number we started with. 

I think Projected is interesting, if a bit disheartening. But it keeps us humble.

On page Space:Database_Errors_Project_2016-05-22 are more columns. Here is the limit 8K characters.

The only term I can think of would be "Potential Increase", but that's very long for a header. I'll think on it and maybe I can come up with something more concise.
Sorry - "Potential Increase" would mean the small differential - not the differential plus the original. This column shows "Potential with increase from average weekly error rate" which is even longer. I can see why you shortened it to "Projected".

Aleš Trtnik Do you still know with how many profiles we started on May 1?

Projected is fine.
I do. On May 1. it was 11107636, but some did not import due to errors in database dump, that were later corrected.

You can check that on http://www.wikitree.com/wiki/Space:Database_dump_statistics
Great work!

Is it possible to include the profile manager(s) in the summary list. The idea would be - I could query for all errors in profiles that I manage, or that I am on the trusted list for. Or I could summarize (for somebody else) all the errors in their profiles. Or we could work on profiles for known inactive/abandoned users.

In the past, Chris was reluctant to dump the profile manager list due to privacy concerns (ability to tell the LNAB for those who wish to remain anonymous). Not sure if that is still the case. I know the database has just a single profile manager, rather than all managers and trusted list.

Chris - perhaps if the dump information (manager/trusted list) filtered out profiles which are not open/public?
Solution is being developed with help of API server, so you could see all errors on your watch list.
I have a lot of 511 errors that are "normal" names for people in the 1800s.  How are you checking for names?

Re Robin

Click in False Error ==> they will not appear again

Also check number of names in the database with this link 

More info on DBE_511 please share hints and best practices with comments on that page...

G2G tag db_error_511

6 Answers

+2 votes
 
Best answer

New error 605, 635 and 665 Number in location

In locations there is only a number. It is often date entered in wrong field.

Errors Total 0000-0000 0001-1499 1500-1699 1700-1799 1800-1899 1900-1999
605 Number in birth location 1933 1041 23 57 173 550 89
635 Number in death location 1623 215 46 131 307 785 139
665 Number in marriage location 228 6   21 37 145 19
by Aleš Trtnik G2G6 Pilot (549k points)
selected by Aleš Trtnik
+5 votes
We should get a new list each Monday, yes?

http://www.softdata.si/osebe_staro/ales/wikitree/Err_20160501/902_0000-0000.htm

Still has some that were fixed .... oh, I get it. There's a directory with a date in it: Err_20160501

So my list is never going to change. Sigh.

I have to find the current list. I'm going to guess it's Err_20160522...  Nope; that didn't work.

Please advise on how I can always find the most recent version of the list I'm working on. Thanks.
by Jillaine Smith G2G6 Pilot (777k points)
Amazing...

I'm so happy to see that there is something really fun here for the tech nerds to enjoy! ;-)

(I'm only half tech-nerd according to my DNA test results...)
Showhidden parameter is now handeled on the form, so you don't have to write it. Also undo for hidden and ignored errors is implemented.
I've added the page for http://www.wikitree.com/wiki/Space:DBE_634.  Send me a message so I can add you?

@Pierre

DBE_632 well done! Made some changes, please review!

Oh, you mean you want the equivalent of project pages for each error code?

Can I do that for 902?

Like this (except i mis-named the page):

http://www.wikitree.com/index.php?title=Space:DBE_903&public=1

 Jan Terink,

Thanks for the changes, but I added one line:

There is one exception: Y (pronounced: [i]) is a commune in the Somme department in Picardy in northern France.​

@Pierre

What I put in all Validation sections is a description of (what I think are) the rules enforced by the error checking software. I doubt very much the exception you added is also in the software.

@ Jillaine
As Jan did put it, you must first create the page with name DBE_902 and thereafter edit the name of the page to the desired name.
So I renamed the page.
Pierre, you didn't actually rename the page; you created a new one. Thank you. I moved the content from 903 to the correct 902.

Jillaine,

Thank you for providing such detailed instructions on fixing 902 errors!

If 902 errors should not be set to "false error" please edit {{Db_errors_G2G|902}} ==> {{Db_errors_G2G|902|N}}. The "N"-parameter suppresses the "false eroor" help text.

I was the one that created the DBE_902 page, and Pierre did correctly change the (caption) name on DBE_903, just like I now did on the DBE_902 page.

+8 votes

Jillaine mentioned last week that fixing errors was like trying to empty a sinking ship with a teaspoon.  What we need Jillaine are more teaspoons. E.g. we need to encourage and recruit more people to  check and fix the profiles they are managers on, and on their own ancestry that they have a personal interest in.  I have been wanting to put comments on pages similar to what I did here:  http://www.wikitree.com/wiki/Pereira-140  but I am limited to only 20 comments per day.

Would it be possible to

1.  put a similar link every WikiTree profile to encourage people look for their own errors?

2.  increase or remove the comment limit for project members?

by Joe Cochoit G2G6 Pilot (221k points)

Surely some of this could be addressed technically? 

For example, there are very few reasons that the gedcom import shouldn't sanitize the name fields to be in Title Case, especially since so many people use all-caps in last names.

That alone would take care of over 19,000 errors  Edit:  Ok, so not that many, (reading comprehension fail) but it is still something that could help out.; 

There's an unsourced template for unsourced profiles. Is it possible/feasible to add a tag to profiles which have errors? Would be super-useful if each page had a link to the category of errors and specific errors if any.

Ian dont you have it on
http://www.wikitree.com/wiki/Space:Database_Errors_Project_2016-05-22

All errors have links based on when the person was born e.g. 1800-1899
 
101 Birth in future
http://www.softdata.si/osebe_staro/ales/wikitree/Err_20160522/101_Now-9999_0.htm

102 Death in future 
1800-1899
http://www.softdata.si/osebe_staro/ales/wikitree/Err_20160522/102_1800-1899_0.htm
1900-1999
http://www.softdata.si/osebe_staro/ales/wikitree/Err_20160522/102_1900-1999_0.htm
2000-now
http://www.softdata.si/osebe_staro/ales/wikitree/Err_20160522/102_2000-Now_0.htm

1483020 Errors   ↓ Total   ↓ 0000-0000   ↓ 0001-1499   ↓ 1500-1699   ↓ 1700-1799   ↓ 1800-1899   ↓ 1900-1999   ↓ 2000-Now   ↓ Now-9999   ↓
101 Birth in future 253               253
102 Death in future 313 2       68 234 5 4
103 Death before birth 13022   125 473 1274 5977 5001 131 41


Plus many many many more

I was thinking more systematically. Rather than each user going page by page and adding said tags themselves.

Sorry Ian I don't follow you. Do you want a bot that updates all the pages by itself?! 

Our Wikitree guru Aleš Trtnik has given us

1) Start with a profile
ex. add {{db_errors}} to a profile and find everything related 5 generations away

or add 
{{Db_errors|10|Mclean-3147|Y}} and you get a link 


2) Start with an error and a time period 
On latest error page you have all errors per time period

Latest report link

e.g. Space:Database_Errors_Project_2016-05-22

and you have e.g. 13428 Mother to young or not born 1900-1999


3) Start with an location

On page http://www.sdms.si:92/wikitree/ShowErrors.htm

Add  Johnson County, Indiana, United States ==> you get 7 errors
 

205 Father is too young or not born Michael-775 Frederick Michael Father
305 Mother too young or not born Hortenstine-12 Sallie Hortenstine Mother
205 Father is too young or not born Bowsher-12 Daniel Bowsher Father
409 Marriage to duplicate person Leach-2321 Francis Marion Leach Spouse
408 Multiple marriages on same day Leach-2321 Francis Marion Leach Spouse
408 Multiple marriages on same day Leach-2321 Francis Marion Leach Spouse
409 Marriage to duplicate person Leach-2321 Francis Marion Leach Spouse

 

The overwhelming majority of users don't read G2G and don't know anything about this project and how it works.  Something to put on every profile would need to take account of that.

You'd get loads of "I've fixed this but it still shows an error" and "I deleted that date but then I got a different error" and "This isn't really an error".

Talicyn Daniels
There are a lot less errors then 19,000 with all caps in last name, but still 3,136.

see also: http://www.wikitree.com/wiki/Space:DBE_107

In fact most of the 107s and 108s seem to have been gedcommed in a long time ago by a handful of people who haven't done much since.
But 107 is full name in all caps.  There are large numbers with LNAB in all caps, including many UNKNOWNs which I think were created by an early version of the gedcom importer.

Some of those may be for the Recyclers, so it won't be helpful to change them.

In general, LNABs shouldn't be changed just for style/conformance reasons without ensuring that the new name is the Final ID, ie correct and not a duplicate.

Would be intresting to get some statistics and grouping errors on how active the profile manager is...

  1. Last edit - this month, last year...
  2. Number of edits after uploading a gedcom file
  3. Number of profiles they are profile managers
  4. Total number of errors for a profile manager
  5. Number of request to join the trust list
  6. Also group it per time period so you get a feeling if changing when a profile is open or not should be adjusted
  7. !?!?!

I think statistics like this give us a better feeling how big challenge this is

 

Your right, I didn't look at the headers, I just assumed it was broken up like the report (time periods).

 

But I still think that it would help as we grow, an easy way to take care of 107 &108.
Also regarding Jillian's teaspoon metaphor, wikitree is a relational database so the totals is number of errors not number of profiles, therefore looking at it by your name or place you may fix multiple errors all on 1 profile more easily than looking at it by line of the (for example them information 1-10 profiles mentioned previously, come up on multiple error lines).

That said I'm having a week on my connected to me error list, just cleared generation 16 but I'm only connected to global on 1 line so not to many yet, how is everyone else here getting on?

http://www.sdms.si:92/wikitree/ShowErrors.htm

p.s. I hope its right but once i fix an error I click hide for 30 days so no one else has to see and fix them as well, figuring yby the time the next list is run it will have gone.
+4 votes

Links to nonLatin profiles

Links to profiles with nonlatin LNAB didn't work. Now they do.

Selective False error

False error is no longer available on all errors. Just on the ones, that make sense.

Undo False error, Temp hide

On page http://www.sdms.si:92/wikitree/ShowErrors.htm you can select to see also hidden errors und unhide them.

by Aleš Trtnik G2G6 Pilot (549k points)
+3 votes

News

Lookup to gender assignments

Here you can check gender assignment for any name http://www.sdms.si:92/wikitree/ShowFirstNames.htm

Finished help for all errors

Help error pages are finished. Native speakers are welcome to correct spelling errors. If something is not clear in descriptions, ask in G2G or correct the page.

by Aleš Trtnik G2G6 Pilot (549k points)
edited by Aleš Trtnik
If I put in just "William", I find 1,000 names containing William, but none starting with William as the first of the given names.
Corrected.

Now you get the most frequent names. but you can also set the maximum limit. Try it with William
Looks fine now, thanks.
+1 vote
Oooh i just found a new feature with your error checker Ales :)

If you put in a surname of interest in the wikitree id box (ie Round not round-218) and set it to 1 generation, it shows you all the errors within your surname of interest ie i just found 19 errors from 286 profiles in the Round surname records.

 

NB - removed the bit about being careful using this feature, must have been my PC
by Paula Dea G2G6 Mach 6 (62.6k points)
edited by Paula Dea
I know of this feature. Even tested it with unknown And it works.

You requested 10 generations and after 7 it was autostop, since you reached 10K errors.

If data is cashed it works even quite fast, otherwise it can take a minute or two to load all data in RAM. That was the reason I didn't advertise it. But it works.

 You can also go over 10 generations if you want.

Just so we understand its no Warning using it? Normally software is there to be used ;-)

Related questions

+21 votes
5 answers
+31 votes
7 answers
+15 votes
0 answers
+19 votes
3 answers
+22 votes
3 answers
+19 votes
2 answers
+22 votes
4 answers
+21 votes
3 answers
+32 votes
9 answers
+19 votes
1 answer

WikiTree  ~  About  ~  Help Help  ~  Search Person Search  ~  Surname:

disclaimer - terms - copyright

...