News on Database errors project (15 May 2016)

+33 votes
1.0k views

Analysis was done on data from May 15th 2016.

In 4 days 6153 errors were corrected by my estimation. Great job.

Have fun correcting errors.

You can also join the project here: http://www.wikitree.com/wiki/Project:Database_Errors

  May 1 May 11 Projected May 15 Reduction Delta%
Profiles   11184648   11206469   100,20%
Locations   10707022   10732026   100,23%
101 Birth in future 343 312 313 308 5 1,47%
102 Death in future 370 343 344 331 13 3,69%
103 Death before birth 13139 13111 13137 13080 57 0,43%
104 Too old 7021 7036 7050 7001 49 0,69%
105 Duplicate sibling 4711 3892 3900 3607 293 7,50%
106 Duplicates between bigtree and unconnected 3253 3293 3299 3245 54 1,65%
201 Father is self 251 240 240 121 119 49,68%
202 Parents are same 224 221 221 193 28 12,84%
203 Father is Female 6167 6244 6256 6257 -1 -0,01%
204 Father has no Gender 2159 1689 1692 1175 517 30,57%
205 Father is too young or not born 48551 48867 48962 48694 268 0,55%
206 Father is too old 6952 6955 6969 6928 41 0,58%
207 Father is also a child 510 502 503 393 110 21,87%
208 Father is also a spouse 241 234 234 232 2 1,05%
209 Father is also a sibling 3527 3512 3519 3236 283 8,04%
210 Father was dead before birth 32482 32559 32623 32505 118 0,36%
301 Mother is self 10 6 6 5 1 16,83%
303 Mother is Male 8321 7931 7946 7880 66 0,84%
304 Mother has no Gender 2101 1856 1860 1715 145 7,78%
305 Mother too young or not born 65178 65596 65724 65535 189 0,29%
306 Mother is too old 5822 5817 5828 5783 45 0,78%
307 Mother is also a child 35 34 34 13 21 61,84%
308 Mother is also a spouse 1566 1578 1581 1516 65 4,12%
309 Mother is also a sibling 373 364 365 362 3 0,74%
310 Mother was dead before birth 31202 31224 31285 31153 132 0,42%
401 Spouse is self 4 3 3 3 0 0,19%
402 Unknown gender of spouse 2990 2538 2543 2386 157 6,17%
403 Single sex marriage 4671 4001 4009 3998 11 0,27%
404 Marriage before birth 10937 10704 10725 10634 91 0,85%
405 Married too old 2857 2871 2877 2847 30 1,03%
406 Marriage after death 12580 12602 12627 12556 71 0,56%
407 Death too old after Marriage 2027 1818 1822 1769 53 2,88%
501 Wrong male gender 7130 7012 7026 9064 -2038 -29,01%
502 Missing male gender 53397 53276 53380 75402 -22022 -41,26%
503 Probably wrong male gender 8380 8349 8365 6237 2128 25,44%
504 Probably missing male gender 56357 56486 56596 36129 20467 36,16%
505 Wrong female gender 9072 8717 8734 10479 -1745 -19,98%
506 Missing female gender 51058 51119 51219 60682 -9463 -18,48%
507 Probably wrong female gender 7027 6946 6960 5389 1571 22,57%
508 Probably missing female gender 37889 37983 38057 29462 8595 22,58%
509 Missing gender 97415 97714 97905 95747 2158 2,20%
510 Unique name without gender 24792 24854 24902 23654 1248 5,01%
511 Unique name (spelling)   476223 477152 476571 581 0,12%
512 Separators in first name   68680 68814 68716 98 0,14%
601 Unknown birth location 9291 9343 9365 9366 -1 -0,01%
603 USA to early in birth location 217129 217281 217788 216951 837 0,38%
631 Unknown death location 16230 16454 16492 16505 -13 -0,08%
632 Y death location 6542 6534 6549 6478 71 1,09%
633 USA to early in death location 80738 80672 80860 80529 331 0,41%
661 Unknown marriage location 1328 1350 1353 1346 7 0,53%
662 Y marriage location 6 6 6 0 6 100,00%
663 USA to early in marriage location 26103 26113 26174 26039 135 0,52%
901 Unconected empty public profiles 35473 35433 35502 35418 84 0,24%
902 Unconected empty open profiles 17242 17221 17255 17172 83 0,48%
Total 1043174 1585719 1588950 1582797 6153 0,39%
 
in The Tree House by Aleš Trtnik G2G6 Pilot (808k points)
retagged by Maggie N.
The row for 308 mother is also a spouse is missing on the page with links, that is http://www.wikitree.com/wiki/Space:Database_Errors_Project_2016-05-15
Recalculated and added. Thanks for notifying me.
Is there any way to separate out protected profiles?  Maybe list it as (all)/(non-protected) do 423/302.   It’s much easier to not have to open all the errors on pages I can’t do anything on.   

Not sure if this is anything that anyone else would find useful.
On error reports you have column privacy, so you can know if you can update the profile. I will add that column also on user's and country reports.

One can of course post a comment on non-open profiles, listing the error(S) and requesting correction, with reference to the project. See the comment by Magnus in this thread.

 

What do the column headers mean please?

I think:

1.5 May 1 version

11.5 May 11 version

Projected ?????

15.5 May 15 version

Reduction Difference between 15.5 and Projected

Delta% Percentage of change

The column headers mean:

May 1 - May 11 - Projected - May 15

Projected means: The total number of profiles has changed between May 11 and May 15. This change is in percentage added to the values of May 11. The difference with May 15 gives de reduction of errors between May 11 and May 15.
Awesome! Thank you. Us 'mericans think dates backwards from the rest of y'all...

(AND: I was hoping there was a comparison thing so we could see what, if any progress we're making.)

Is anyone else working on 902? (I know Ales is...)

in 4 days it was 0.20% increase in profile number, and 0.23% increase for entered locations. According to this number of errors should increased for 0.20% and 600 errors for 0.23%. So as I wrote In 4 days 6153 errors were corrected by my estimation. 

@ Jilliane
For Error #902 you see:

May 11: 17,221 error records
May 15: 17,172 error records

So the number has been decreased by 49 errors.
But statisticly the number has grown by 0.2% = 34 errors due to new profiles in those 4 days.

That is why the total solved for error #902 is estimated at 49 + 34 = 83 errors
So I'm trying to empty a sinking boat with a teaspoon?

@Jillaine

No, I'll throw in some Dutch:

Je bent aan het dweilen met de kraan open

Well this profiles are the result of missing delete button for 8 years. I remember that when I started I used a lot of time to delete (merge) first profile. And even now it is not easy to do. Maybe proces to delete a profile can be simplified and you can delete empty profiles faster.
Jan, Google translated that to "you are mopping with a tap". :-)

Aleš, yes, perhaps your work will encourage Chris et al to rethink the delete policy. That said, I'm not finding it takes long to merge these profiles away. For those that are empty, disconnected and orphaned, I've found it easy to find another orphaned profile to merge it into.

Jillaine, You are doing this every day and it is easy for you. For occasional user it is not. How can someone find another orphaned profile to merge it into. Today that is easy for me, since I understand data structure and philosophy of WikiTree, but two month ago i had no idea how to delete a profile. And when you come to merge, uncertainty even increases. 

Jilliane,

So much for Google...

It should translate to "you are mopping with the tap running  -)"

I wonder weather merging of these #900 profiles is a usefull solution, for it doesn't shrink the database. So what do we gain by merging them?
They do stay in history, but are no longer counted as profiles, don't appear in unconnected, are not part of database dump and so on.

I only merge them away if all of the following conditions are met:

  • There is NO data in any of the fields-- no birth date, no death date, no places, etc. I also check the narrative because sometimes there is a clue there. But if no clue...
  • The profile is DISCONNECTED from any other profile; i.e., the person is just hanging out there all by their lonesome, floating in the aether.
  • The profile is ORPHANED.
Then I click on "find duplicates for ..." at the bottom of the page. If I'm lucky, this brings up a list of people with the same names. I look for a profile that has dates AND is also orphaned. You can see this on the list of names when it says:
 
managed by    . 
I do a COMPARE to make sure there isn't something I missed, then I select MERGE and complete the merge, ensuring that the good data is retained. Then I clean up the narrative.
 
It's not that difficult.
 
On the Unnamed and Unknown gender infant profiles, should we mark them False Error, so they will go away, as they cannot be corrected?

Also, I did a report of my ancestors (18 generations) and Generations 14 on are much larger than 1 - 13, far right columns are off the page.  Makes marking the False or Temp error a pain, having to slide view back and forth.
Of course. that is one of the reasons for False Errors.

Your generation 14 has a few profiles without spaces after comma in locations, so they cannot be shown on smaller monitors.

You can try different approach to correct this errors. Take one profile from level 14 or 10 like Veiset-6 and do 5 or 10 generations around this profile. Then select some other profile. On next update, number of errors will reduce and continue this way. This way you will work on related profiles at once. On generations 14 people can be 28 generations apart.
I'm sure this answer is somewhere, I just can't seem to find it--how often is the error database updated?
Usually on monday, with new post in g2g
@Patricia Roche

I do not know if you have EXCEL on your computer, but if you have, you can copy the whole list to an Excel sheet, and there the columns will be much smaller.
Select everything with Ctrl+A, copy with Ctrl+C and paste with Ctrl+V.
On http://www.wikitree.com/wiki/Space:DBE_109  These errors can only be fixed by a Sysops...unless, you are a profile manager or on the trusted list.   Otherwise, you can use the OPEN Profile request on the profile and a Sysop will perform the magic.   Has anyone asked Paul Bech to look at the list?

@Robin

These error description pages are written from the perspective that the profile in question is open or that the reader is (one of) the profile manager(s), to keep these descriptions straight without diverting to all kind of exceptional situations.

In case the reader cannot perform the action described, she/he will (have to) contact the profile manager anyway.

OPEN Profile request Where do I find that function?

Asking Paul Bech  Sorry, how would I know and why would I ask him?

List What list?

 

 

Somewhere on the profile or one of its drop down menus one can find the open profile request.

Paul Bech is the wikitree team member who handles / responds to open profile requests.

The list is the list of errors

I will add open profile request to help page. 

But actually Paul Bech should check Error 109 Here he can get the errors.

http://www.wikitree.com/wiki/Space:Database_Errors_Project_2016-05-15#Added_109_Profile_should_be_open_.28birth_date.29.2C_110_Profile_should_be_open_.28death_date.29
Hi Alecs

I applaud the "False Error" option button as I have a lot of "unique names" in my ancestry (they were very original people!) and so lots of "false errors" due to these unique names.

One annoying thing about the "False Error" button is that when you use it, the computer does NOT reboot you to the original place where you were but you must do that manually and refind your place.  With 15 generations this can be tiresome to do over and over.

Can you reprogram so that when you use the "It's a False Error" option, you automatically are taken back to where you were ?  OR alternately, can you add a "multiple false errors" button so one can tag ALL the false errors one sees at one time and then tell the program this in one stroke ?  In fact, a "multiple same fix" or "fix same error" option would be great - I notice a LOT of "use of USA, United States too early" errors - mostly on profiles I adopted as Ancestry.com automatically put United States on ALL American profiles, even those back in 1620 or so....

Thanks for considering these changes.   Chet

PS - Chris Whitten our Leader says we are considering adopting specific Geographic Naming conventions in the Birth, marriage and Death location fields so we should not construct new guidelines until the new suggestions come out.
@ Chet Snow,
I don't have your problem.:
When I click on a name and return to the page, the color of the link has changed, so I see where I was on that page.
When I click on "temp hide" then a new page opens. I do not close that page with the "X", but go back one page and return exactly where I was on that page.

@ Chet Snow,

I use Google Chrome Browser. When I click on the "temp hide" link, I hold down the Ctrl key. That makes the link open in a new tab and leaves me where I was.

It mght also work this way in other browsers.

 

CTRL-click works on any browser.

Back button also return to same position.

If you do reload, to actually see the error disappear, you have to manually find the location.

For now it is a simple link so making multiple select is not possible. Also lists are huge and making 5000 checkboxes would have big performance impact.

As for Location field I know and am waiting to see the outcome of it. I don't do anything new about that. I was even considering taking error 603, 633 and 663 offline. Bot there was no interest in that.

@Aleš

I would certainly vote in favor of taking errors 6n3 offline, pending precise instructions on what to use instead of USA. Just erasing USA from locations does not make much sense to me.

I kind of agree with you. I checked some changes and they are not all just erasing usa, some are putting it into () and adding some colony names. But it would be best to wait for outcome of Location changes.

Today is a new database dump and I will remove this error for a while.
Hi Alecs

Can I vote YES to taking those geographic "error" categories off line until we have the new guidelines ?  Many people see these and want to eliminate "errors" when in fact they may be creating new ones.

I will try using the Ctrl button for the False Error reports; I understand about the technical drawbacks on multiple changes - just thought I'd ask.  Your work is fascinating and has revealed so much but let's "hold" on the geographical names for a while until Chris & Sysops gets us good data to work from.  Thanks.

Maybe add a listbox where the user can select  strict/less strict error checking ==> if you select strict error checking you also have some rules that are maybe to strict and gives more errors that are not errors..... or errors that we need to agree on before telling this is wrong inside Wikitree.....
 

 

Hi

Although this may sound good, I don't think it is a good principle to start thinking about 2 "levels" or "2 types" of WikiTree - we have standards and most of these "error" categories are pretty basic.  But, right now, the geographic names standards should wait until the Leadership - Chris etc. - decides which system we want to follow.
6x3 Errors are offline until Location field is changed and new guidelines are prepared.
Thank you, Alecs.  I am sure we will be told when and what those guidelines will be when the time is right.  For now, there are PLENTY of other Errors to correct!!  Best always,  Chet Snow

7 Answers

+9 votes

News

Template to put link to errors on profile pages

There is also a template {{db_errors}} to put on profile and have link to errors for that profile and connected ones. Check documentation for this template.

  • {{db_errors}} ==> Generates a link that generates a report of current Wikiprofile 5 generations. This form can be used only in biography, not in comments.
  • {{db_errors|10}} ==> Same as 1 but 10 generations. This form can be used only in biography, not in comments.
  • {{db_errors|10|Sälgö-2}} ==> Same as 2 but starts with Wikiprofile Sälgö-2. This form can be used in comments, freespace pages and everywhere else on WikiTree.
  • {{db_errors|Generations=10|WikiTreeID=Sälgö-3}} ==> This form can be used in comments, freespace pages and everywhere else on WikiTree.
  • {{Db_errors|10|Sälgö-1|Y}} ==> Third parameter adds more help text. This form can be used in comments, freespace pages and everywhere else on WikiTree.
by Aleš Trtnik G2G6 Pilot (808k points)
edited by Aleš Trtnik
How to use this on a protected profile, where I cannot edit the biography?
Can this be put in the public comment?

@Pierre

Yes, but only format 4. See also this gsg thread

I think you can not include template in comment. And if you cannot edit the Bio, you cannot use this. It was intended to put on your profile, to easily check for errors on your tree or on trees of interest.
Looks like I am wrong.
I checked on my own profile, and conclude that Version 3 and 4 can be used in the Public Comment.
I see no difference between 3 and 4.

{{db_errors|10|Sälgö-2|Y}}

You have a new more verbose version if you add a parameter 3 that should work on a comment

Please let me know if we should have some other text...

Comments I thinks are a Wikitree specific function that doesnt implement support for 

That shows:
Database error check see Database errors project for more info

This profile has been identified to have problem. For more information please seeProject:Database_Errors or ask a question at G2G with tag db_errors

I like it, but I suggest  the following sequence:

This profile has been identified to have a problem. 
If not clear, put your question at G2G with tag db_errors
Please do a Database error check and see Database errors project for more information.

@Pierre

Discussion of template and  texts is also going on in this thread. Please have a look there.

+8 votes

Added 109 Profile should be open (birth date), 110 Profile should be open (death date)

Here are profiles, that should be open, since birth / death date is older than 200 years or date is wrong.

  Total 0000-0000 0001-1499 1500-1699 1700-1799 1800-1899 1900-1999 2000-Now Now-9999
109 Profile should be open (birth date) 11667 7 411 199 3611 7439      
110 Profile should be open (death date) 1516 713 11 72 154 317 247 2  

Added 107 Full name in UPPERCASE, 108 Full name in lowercase

Here are profiles, that have whole full name in uppercase or lowercase.

  Total 0000-0000 0001-1499 1500-1699 1700-1799 1800-1899 1900-1999 2000-Now Now-9999
107 Full name in UPPERCASE 3153 1987 4 66 79 390 610 17  
108 Full name in lowercase 3207 2900 3 3 14 49 231 7  
by Aleš Trtnik G2G6 Pilot (808k points)
+9 votes

Temporary hidden errors

I added the system similar to false errors, that hide an error for a month.

If you encounter an error, that you cannot fix and you posted a message to profile manager or you proposed a merge, you can click a link on the right to tell the system to ignore this error for a month. If profile manager will correct the error, it will no longer exist, otherwise error will reappear after 31 days so other actions can be taken. Error will be hidden at latest on next recalculation (on monday).

by Aleš Trtnik G2G6 Pilot (808k points)
Will still born Unknown twins then keep appearing on the 105 Duplicate sibling list every month, even if I mark them as false errors and I mark them as a rejected match?

There are many such instances for example: http://www.wikitree.com/wiki/Willey-1226 & http://www.wikitree.com/wiki/Willey-1227

Is there a guideline for differentiating between Unnamed twins, and if not should one be implemented?

No. Now you have 2 links on the right side. One to identify False Error that will always remain hidden and new link Temp hide, that will hide the error for 1 month and then it will reappear if it was not corrected.

But error 105 appears twice for each pair and you have to mark both as False error.
Thanks Aleš, now I understand. You are doing an amazing job.
> Error will be hidden at latest on next recalculation (on monday).

Monday in which timezone?

I corrected a number of errors late on Sunday night, but they didn't make the cut, because I didn't consider the timezone (and weekends are best for me to make corrections)

Chris Wrote As I mentioned, it's now set to run every Sunday night (US central time). But I think he set it up Sunday noon. Last profile created was http://www.wikitree.com/index.php?title=Special:NetworkFeed&who=Crouch-1691 The Time on files is 15. 5. 2015 12:18, so it looks like sunday noon US central time is the dump time. That is UK 18:00, Germany 19:00.

And then I need a few hours to import and recalculate. I will always post in G2G, when new errors are recalculated.

+11 votes

Description of errors

I finished short description for all errors. You can see it on the project page.

by Aleš Trtnik G2G6 Pilot (808k points)
+6 votes

Added 604, 634 & 664 Too short location

Short locations are not allowed, since they can be ambiguous. Also people from other parts of the world don't understand them. For now MinLength is 4 with exceptions like USA, UK. American states should be at least in form PA, USA which is longer than 4 letters.

Error Total  0000-0000 0001-1499 1500-1699 1700-1799 1800-1899 1900-1999
604 Birth location too short 16791 1974 131 1006 2646 9284 1750
634 Death location too short 18242 1109 313 1121 2762 11516 1421
664 Marriage location too short 3246 228 30 379 680 1653 276

Updated 602, 632 & 662

Added checking of yes * and y *.

Error Total 0000-0000 0001-1499 1500-1699 1700-1799 1800-1899 1900-1999
602 Y birth location 3 3          
632 Y death location 6736 80 48 140 663 5715 90
662 Y marriage location 7 1     2 4  

 

by Aleš Trtnik G2G6 Pilot (808k points)
+6 votes
This is wonderful work, BUT....can people please use the comment field to explain what they are changing in my profiles.....I had over 50 profiles that were changed just to remove USA, or the like....looking through all those profiles to assure that the person was not changing my data was a nuisance.
by Robin Lee G2G6 Pilot (862k points)

I agree they should enter edit comment.

But you can do that on your own. Here you can see possible errors in your tree and correct them before others do. You must start with your grandparrents (Lee-5964), since your parents are private. http://www.sdms.si:92/wikitree/ShowErrors.htm

Robin,

DId they remove USA and leave it blank. Argh? I have added USA if not present on profiles in the past. Probably going to hold off on location name changes unless really a mess waiting for the familySearch integration.

Marty
+8 votes

Added 408 Multiple marriages on same day, 409 Marriage to duplicate person

  • 408 Multiple marriages on same day: This person married to two partners on the same day.
  • 409 Marriage to duplicate person: This person is married twice to a person with same name.
Error Total 0000-0000 0001-1499 1500-1699 1700-1799 1800-1899 1900-1999 2000-Now
408 Multiple marriages on same day 10234 293 40 1686 3043 4789 383  
409 Marriage to duplicate person 31870 4119 411 4269 8046 13421 1602 2
by Aleš Trtnik G2G6 Pilot (808k points)

Related questions

+23 votes
5 answers
+16 votes
0 answers
+21 votes
3 answers
+24 votes
3 answers
+21 votes
2 answers
+24 votes
4 answers
+23 votes
3 answers
+34 votes
9 answers
+20 votes
1 answer
+33 votes
5 answers

WikiTree  ~  About  ~  Help Help  ~  Search Person Search  ~  Surname:

disclaimer - terms - copyright

...