News on Database errors project (15 May 2016)

+31 votes
610 views

Analysis was done on data from May 15th 2016.

In 4 days 6153 errors were corrected by my estimation. Great job.

Have fun correcting errors.

You can also join the project here: http://www.wikitree.com/wiki/Project:Database_Errors

  May 1 May 11 Projected May 15 Reduction Delta%
Profiles   11184648   11206469   100,20%
Locations   10707022   10732026   100,23%
101 Birth in future 343 312 313 308 5 1,47%
102 Death in future 370 343 344 331 13 3,69%
103 Death before birth 13139 13111 13137 13080 57 0,43%
104 Too old 7021 7036 7050 7001 49 0,69%
105 Duplicate sibling 4711 3892 3900 3607 293 7,50%
106 Duplicates between bigtree and unconnected 3253 3293 3299 3245 54 1,65%
201 Father is self 251 240 240 121 119 49,68%
202 Parents are same 224 221 221 193 28 12,84%
203 Father is Female 6167 6244 6256 6257 -1 -0,01%
204 Father has no Gender 2159 1689 1692 1175 517 30,57%
205 Father is too young or not born 48551 48867 48962 48694 268 0,55%
206 Father is too old 6952 6955 6969 6928 41 0,58%
207 Father is also a child 510 502 503 393 110 21,87%
208 Father is also a spouse 241 234 234 232 2 1,05%
209 Father is also a sibling 3527 3512 3519 3236 283 8,04%
210 Father was dead before birth 32482 32559 32623 32505 118 0,36%
301 Mother is self 10 6 6 5 1 16,83%
303 Mother is Male 8321 7931 7946 7880 66 0,84%
304 Mother has no Gender 2101 1856 1860 1715 145 7,78%
305 Mother too young or not born 65178 65596 65724 65535 189 0,29%
306 Mother is too old 5822 5817 5828 5783 45 0,78%
307 Mother is also a child 35 34 34 13 21 61,84%
308 Mother is also a spouse 1566 1578 1581 1516 65 4,12%
309 Mother is also a sibling 373 364 365 362 3 0,74%
310 Mother was dead before birth 31202 31224 31285 31153 132 0,42%
401 Spouse is self 4 3 3 3 0 0,19%
402 Unknown gender of spouse 2990 2538 2543 2386 157 6,17%
403 Single sex marriage 4671 4001 4009 3998 11 0,27%
404 Marriage before birth 10937 10704 10725 10634 91 0,85%
405 Married too old 2857 2871 2877 2847 30 1,03%
406 Marriage after death 12580 12602 12627 12556 71 0,56%
407 Death too old after Marriage 2027 1818 1822 1769 53 2,88%
501 Wrong male gender 7130 7012 7026 9064 -2038 -29,01%
502 Missing male gender 53397 53276 53380 75402 -22022 -41,26%
503 Probably wrong male gender 8380 8349 8365 6237 2128 25,44%
504 Probably missing male gender 56357 56486 56596 36129 20467 36,16%
505 Wrong female gender 9072 8717 8734 10479 -1745 -19,98%
506 Missing female gender 51058 51119 51219 60682 -9463 -18,48%
507 Probably wrong female gender 7027 6946 6960 5389 1571 22,57%
508 Probably missing female gender 37889 37983 38057 29462 8595 22,58%
509 Missing gender 97415 97714 97905 95747 2158 2,20%
510 Unique name without gender 24792 24854 24902 23654 1248 5,01%
511 Unique name (spelling)   476223 477152 476571 581 0,12%
512 Separators in first name   68680 68814 68716 98 0,14%
601 Unknown birth location 9291 9343 9365 9366 -1 -0,01%
603 USA to early in birth location 217129 217281 217788 216951 837 0,38%
631 Unknown death location 16230 16454 16492 16505 -13 -0,08%
632 Y death location 6542 6534 6549 6478 71 1,09%
633 USA to early in death location 80738 80672 80860 80529 331 0,41%
661 Unknown marriage location 1328 1350 1353 1346 7 0,53%
662 Y marriage location 6 6 6 0 6 100,00%
663 USA to early in marriage location 26103 26113 26174 26039 135 0,52%
901 Unconected empty public profiles 35473 35433 35502 35418 84 0,24%
902 Unconected empty open profiles 17242 17221 17255 17172 83 0,48%
Total 1043174 1585719 1588950 1582797 6153 0,39%
 
in The Tree House by Aleš Trtnik G2G6 Pilot (561k points)
retagged by Maggie N.
@ Chet Snow,
I don't have your problem.:
When I click on a name and return to the page, the color of the link has changed, so I see where I was on that page.
When I click on "temp hide" then a new page opens. I do not close that page with the "X", but go back one page and return exactly where I was on that page.

@ Chet Snow,

I use Google Chrome Browser. When I click on the "temp hide" link, I hold down the Ctrl key. That makes the link open in a new tab and leaves me where I was.

It mght also work this way in other browsers.

 

CTRL-click works on any browser.

Back button also return to same position.

If you do reload, to actually see the error disappear, you have to manually find the location.

For now it is a simple link so making multiple select is not possible. Also lists are huge and making 5000 checkboxes would have big performance impact.

As for Location field I know and am waiting to see the outcome of it. I don't do anything new about that. I was even considering taking error 603, 633 and 663 offline. Bot there was no interest in that.

@Aleš

I would certainly vote in favor of taking errors 6n3 offline, pending precise instructions on what to use instead of USA. Just erasing USA from locations does not make much sense to me.

I kind of agree with you. I checked some changes and they are not all just erasing usa, some are putting it into () and adding some colony names. But it would be best to wait for outcome of Location changes.

Today is a new database dump and I will remove this error for a while.
Hi Alecs

Can I vote YES to taking those geographic "error" categories off line until we have the new guidelines ?  Many people see these and want to eliminate "errors" when in fact they may be creating new ones.

I will try using the Ctrl button for the False Error reports; I understand about the technical drawbacks on multiple changes - just thought I'd ask.  Your work is fascinating and has revealed so much but let's "hold" on the geographical names for a while until Chris & Sysops gets us good data to work from.  Thanks.

Maybe add a listbox where the user can select  strict/less strict error checking ==> if you select strict error checking you also have some rules that are maybe to strict and gives more errors that are not errors..... or errors that we need to agree on before telling this is wrong inside Wikitree.....
 

 

Hi

Although this may sound good, I don't think it is a good principle to start thinking about 2 "levels" or "2 types" of WikiTree - we have standards and most of these "error" categories are pretty basic.  But, right now, the geographic names standards should wait until the Leadership - Chris etc. - decides which system we want to follow.
6x3 Errors are offline until Location field is changed and new guidelines are prepared.
Thank you, Alecs.  I am sure we will be told when and what those guidelines will be when the time is right.  For now, there are PLENTY of other Errors to correct!!  Best always,  Chet Snow

7 Answers

+7 votes

News

Template to put link to errors on profile pages

There is also a template {{db_errors}} to put on profile and have link to errors for that profile and connected ones. Check documentation for this template.

  • {{db_errors}} ==> Generates a link that generates a report of current Wikiprofile 5 generations. This form can be used only in biography, not in comments.
  • {{db_errors|10}} ==> Same as 1 but 10 generations. This form can be used only in biography, not in comments.
  • {{db_errors|10|Sälgö-2}} ==> Same as 2 but starts with Wikiprofile Sälgö-2. This form can be used in comments, freespace pages and everywhere else on WikiTree.
  • {{db_errors|Generations=10|WikiTreeID=Sälgö-3}} ==> This form can be used in comments, freespace pages and everywhere else on WikiTree.
  • {{Db_errors|10|Sälgö-1|Y}} ==> Third parameter adds more help text. This form can be used in comments, freespace pages and everywhere else on WikiTree.
by Aleš Trtnik G2G6 Pilot (561k points)
edited by Aleš Trtnik
How to use this on a protected profile, where I cannot edit the biography?
Can this be put in the public comment?

@Pierre

Yes, but only format 4. See also this gsg thread

I think you can not include template in comment. And if you cannot edit the Bio, you cannot use this. It was intended to put on your profile, to easily check for errors on your tree or on trees of interest.
Looks like I am wrong.
I checked on my own profile, and conclude that Version 3 and 4 can be used in the Public Comment.
I see no difference between 3 and 4.

{{db_errors|10|Sälgö-2|Y}}

You have a new more verbose version if you add a parameter 3 that should work on a comment

Please let me know if we should have some other text...

Comments I thinks are a Wikitree specific function that doesnt implement support for 

That shows:
Database error check see Database errors project for more info

This profile has been identified to have problem. For more information please seeProject:Database_Errors or ask a question at G2G with tag db_errors

I like it, but I suggest  the following sequence:

This profile has been identified to have a problem. 
If not clear, put your question at G2G with tag db_errors
Please do a Database error check and see Database errors project for more information.

@Pierre

Discussion of template and  texts is also going on in this thread. Please have a look there.

+6 votes

Added 109 Profile should be open (birth date), 110 Profile should be open (death date)

Here are profiles, that should be open, since birth / death date is older than 200 years or date is wrong.

  Total 0000-0000 0001-1499 1500-1699 1700-1799 1800-1899 1900-1999 2000-Now Now-9999
109 Profile should be open (birth date) 11667 7 411 199 3611 7439      
110 Profile should be open (death date) 1516 713 11 72 154 317 247 2  

Added 107 Full name in UPPERCASE, 108 Full name in lowercase

Here are profiles, that have whole full name in uppercase or lowercase.

  Total 0000-0000 0001-1499 1500-1699 1700-1799 1800-1899 1900-1999 2000-Now Now-9999
107 Full name in UPPERCASE 3153 1987 4 66 79 390 610 17  
108 Full name in lowercase 3207 2900 3 3 14 49 231 7  
by Aleš Trtnik G2G6 Pilot (561k points)
+7 votes

Temporary hidden errors

I added the system similar to false errors, that hide an error for a month.

If you encounter an error, that you cannot fix and you posted a message to profile manager or you proposed a merge, you can click a link on the right to tell the system to ignore this error for a month. If profile manager will correct the error, it will no longer exist, otherwise error will reappear after 31 days so other actions can be taken. Error will be hidden at latest on next recalculation (on monday).

by Aleš Trtnik G2G6 Pilot (561k points)
Will still born Unknown twins then keep appearing on the 105 Duplicate sibling list every month, even if I mark them as false errors and I mark them as a rejected match?

There are many such instances for example: http://www.wikitree.com/wiki/Willey-1226 & http://www.wikitree.com/wiki/Willey-1227

Is there a guideline for differentiating between Unnamed twins, and if not should one be implemented?

No. Now you have 2 links on the right side. One to identify False Error that will always remain hidden and new link Temp hide, that will hide the error for 1 month and then it will reappear if it was not corrected.

But error 105 appears twice for each pair and you have to mark both as False error.
Thanks Aleš, now I understand. You are doing an amazing job.
> Error will be hidden at latest on next recalculation (on monday).

Monday in which timezone?

I corrected a number of errors late on Sunday night, but they didn't make the cut, because I didn't consider the timezone (and weekends are best for me to make corrections)

Chris Wrote As I mentioned, it's now set to run every Sunday night (US central time). But I think he set it up Sunday noon. Last profile created was http://www.wikitree.com/index.php?title=Special:NetworkFeed&who=Crouch-1691 The Time on files is 15. 5. 2015 12:18, so it looks like sunday noon US central time is the dump time. That is UK 18:00, Germany 19:00.

And then I need a few hours to import and recalculate. I will always post in G2G, when new errors are recalculated.

+9 votes

Description of errors

I finished short description for all errors. You can see it on the project page.

by Aleš Trtnik G2G6 Pilot (561k points)
+4 votes

Added 604, 634 & 664 Too short location

Short locations are not allowed, since they can be ambiguous. Also people from other parts of the world don't understand them. For now MinLength is 4 with exceptions like USA, UK. American states should be at least in form PA, USA which is longer than 4 letters.

Error Total  0000-0000 0001-1499 1500-1699 1700-1799 1800-1899 1900-1999
604 Birth location too short 16791 1974 131 1006 2646 9284 1750
634 Death location too short 18242 1109 313 1121 2762 11516 1421
664 Marriage location too short 3246 228 30 379 680 1653 276

Updated 602, 632 & 662

Added checking of yes * and y *.

Error Total 0000-0000 0001-1499 1500-1699 1700-1799 1800-1899 1900-1999
602 Y birth location 3 3          
632 Y death location 6736 80 48 140 663 5715 90
662 Y marriage location 7 1     2 4  

 

by Aleš Trtnik G2G6 Pilot (561k points)
+4 votes
This is wonderful work, BUT....can people please use the comment field to explain what they are changing in my profiles.....I had over 50 profiles that were changed just to remove USA, or the like....looking through all those profiles to assure that the person was not changing my data was a nuisance.
by Robin Lee G2G6 Pilot (707k points)

I agree they should enter edit comment.

But you can do that on your own. Here you can see possible errors in your tree and correct them before others do. You must start with your grandparrents (Lee-5964), since your parents are private. http://www.sdms.si:92/wikitree/ShowErrors.htm

Robin,

DId they remove USA and leave it blank. Argh? I have added USA if not present on profiles in the past. Probably going to hold off on location name changes unless really a mess waiting for the familySearch integration.

Marty
+6 votes

Added 408 Multiple marriages on same day, 409 Marriage to duplicate person

  • 408 Multiple marriages on same day: This person married to two partners on the same day.
  • 409 Marriage to duplicate person: This person is married twice to a person with same name.
Error Total 0000-0000 0001-1499 1500-1699 1700-1799 1800-1899 1900-1999 2000-Now
408 Multiple marriages on same day 10234 293 40 1686 3043 4789 383  
409 Marriage to duplicate person 31870 4119 411 4269 8046 13421 1602 2
by Aleš Trtnik G2G6 Pilot (561k points)

Related questions

+21 votes
5 answers
+15 votes
0 answers
+19 votes
3 answers
+22 votes
3 answers
+19 votes
2 answers
+22 votes
4 answers
+21 votes
3 answers
+32 votes
9 answers
+19 votes
1 answer
+31 votes
5 answers

WikiTree  ~  About  ~  Help Help  ~  Search Person Search  ~  Surname:

disclaimer - terms - copyright

...