upload image

Database errors project (1 May 2016)

Privacy Level: Open (White)
Date: 1 May 2016 to 11 May 2016
Location: Worldwidemap
Surname/tag: data_doctors
Profile manager: Aleš Trtnik private message [send private message]
This page has been accessed 719 times.

Categories: DD Suggestions.

This page is part of the Data Doctors Project.
Latest report: February 10th 2019 and the Spreadsheet.
Custom reports by: Suggestion lists, Unsourced lists, Unconnected lists.
See WikiTree+ for custom reports and statistics.
Data Doctors Challenge: Dates_VIII .

Analysis was done on data from May 1st 2016.

Here are pages of errors lists with basic person data and links to WikiTree.

Contents

News

Errors connected to you and errors by location

Here you can get all errors of profiles, that are connected to you or any other profile. you can also get errors for any word that appears in birth or death location.

Weekly update

Great news: Chris just notified me that dump has weekly schedule on Sunday night (US central time). So during monday errors will be updated.

210, 310 Errors - Father/Mother was dead before birth

63684 Errors Total 0000-0000 0001-1499 1500-1699 1700-1799 1800-1899 1900-1999 2000-Now Now-9999
210 Father was dead before birth 32482 1360 2021 6219 9517 12550 815
310 Mother was dead before birth 31202 1534 1219 6121 9412 12129 783 3 1

106 Error - Duplicates between bigtree and unconnected

Paula Round ask for this to connect unconnected to global tree.

3253 Errors Total 0000-0000 0001-1499 1500-1699 1700-1799 1800-1899 1900-1999 2000-Now Now-9999
106 Duplicates between bigtree and unconnected 3253 1 24 403 955 1577 293

600 Errors - Location errors

  • 601, 631 and 661 "Unknown" location - Unknown is not a location. If not known field should be empty.
  • 602, 632 and 662 "Y" location - Y is not a location. I think this locations were part of GEDCOM imports (Maybe some error in GEDCOM format) and never corrected.
  • 603, 633, 663 USA used to early - USA is used before the country existed. Old name should be used.
357367 Errors Total 0000-0000 0001-1499 1500-1699 1700-1799 1800-1899 1900-1999 2000-Now Now-9999
601 Unknown birth location 9291 1734 24 362 1368 4342 1457 3 1
603 USA to early in birth location 217129 13 82 50652 166382
631 Unknown death location 16230 1470 60 783 2639 8962 2315 1
632 Y death location 6542 87 48 138 633 5551 85
633 USA to early in death location 80738 1067 46 55602 23890 92 41
661 Unknown marriage location 1328 71 3 73 230 726 225
662 Y marriage location 6 1 1 3 1
663 USA to early in marriage location 26103 407 5 10097 15544 44 6

900 Errors - empty data

I added this errors to find empty profiles. That means that profile has no relations, no birth and death data and is open or public. This was done based on Jillaine Smith request. For now there are 2 errors 901 unconnected empty public profiles and 902 unconnected empty open profiles.

52715 Errors Total 0000-0000 0001-1499 1500-1699 1700-1799 1800-1899 1900-1999 2000-Now Now-9999
901 Unconected empty public profiles 35473 35473
902 Unconected empty open profiles 17242 17242

500 Errors - Gender

500 errors are derived from database sample of names.

  • If name appears more then 50 times:
    • Frequency 97-100%: Gender is definitely male/female.
    • Frequency 90-97%, other gender less than 2%: Gender is definitely male/female.
    • Frequency 90-97%, other gender more than 2%: Gender is probably male/female.
    • Frequency 70-90%, other gender less than 2%: Gender is probably male/female.
    • Frequency 70-90%, other gender more than 2%: Gender is bisexual or unsure.
    • Frequency 30-70%: Gender is bisexual or unsure.
  • If name appears less then 50 times:
    • Frequency 90-100%: Gender is probably male/female.
    • Frequency 70-90%, other gender less than 10%: Gender is probably male/female.
    • Frequency 30-70%: Gender is bisexual or unsure.
349886 Errors Total 0000-0000 0001-1499 1500-1699 1700-1799 1800-1899 1900-1999 2000-Now Now-9999
501 Wrong male gender 7130 1195 94 529 1119 3152 1039 2
502 Missing male gender 53397 25338 116 1026 4002 15603 7286 26
503 Probably wrong male gender 8380 1806 139 717 1186 3175 1348 7 2
504 Probably missing male gender 56357 34800 110 817 3112 11031 6438 49
505 Wrong female gender 9072 1455 40 454 1489 4430 1200 4
506 Missing female gender 51058 24435 56 930 3968 15834 5820 14 1
507 Probably wrong female gender 7027 1184 27 346 1054 3234 1180 2
508 Probably missing female gender 37889 23876 33 381 1582 7536 4453 26 2
509 Missing gender 97415 79807 52 497 1615 8444 6939 59 2
510 Unique name without gender 24792 11141 61 208 811 7523 4976 71 1

Errors in May 2016

Analysis was done on data from May 1st 2016.

Explanation of error changes:

  • There is approximately 10% increase in 400 errors, because april import of marriages was not complete. That is 3200 errors.
  • There is approximately 8% increase in gender errors (203, 204, 303, 304), because in april import there was no gender for privacy level 35 and 40. That is 1500 errors.
  • Increase in persons profiles was 2% so all errors should increase for that amount. that is 4000 errors.

That should be 215000 errors, so in april 1500 errors were corrected by my estimation.

Note: Usually with one correction you can correct multiple errors, because errors repeat in different groups.

213638 Errors Total 0000-0000 0001-1499 1500-1699 1700-1799 1800-1899 1900-1999 2000-Now Now-9999
101 Birth in future 343 343
102 Death in future 370 22 1 85 224 5 33
103 Death brfore birth 13139 137 482 1272 6010 5003 137 98
104 Too old 7021 474 1268 2128 2982 161 4 4
105 Duplicate sibling 4711 14 361 994 2539 803
201 Father is self 251 88 10 2 47 89 15
202 Parents are same 224 22 25 17 20 110 30
203 Father is Female 6167 830 43 280 1172 3419 423
204 Father has no Gender 2159 908 12 39 199 928 73
205 Father is too young or not born 48551 2366 5387 10076 21026 9560 84 52
206 Father is too old 6952 619 1569 2470 2261 33
207 Father is also a child 510 102 28 66 125 166 22 1
208 Father is also a spouse 241 22 6 14 45 140 14
209 Father is also a sibling 3527 442 115 325 852 1496 297
301 Mother is self 10 4 5 1
303 Mother is Male 8321 807 76 400 1667 4817 554
304 Mother has no Gender 2101 877 3 45 203 889 84
305 Mother too young or not born 65178 2916 7187 14226 27415 13290 95 49
306 Mother is too old 5822 439 1236 1988 2126 33
307 Mother is also a child 35 5 1 8 8 10 3
308 Mother is also a spouse 1566 133 64 236 452 620 61
309 Mother is also a sibling 373 59 7 15 67 182 41 2
401 Spouse is self 4 2 1 1
402 Unknown gender of spouse 2990 792 15 80 311 1447 345
403 Single sex marriage 4671 401 44 278 893 2492 563
404 Marrige before birth 10937 213 956 2334 5268 2108 27 31
405 Married too old 2857 143 458 776 1473 7
406 Marrige after death 12580 534 366 1951 3007 6080 640 1 1
407 Death too old after Marriage 2027 61 17 226 478 1006 228 3 8




Collaboration

On 26 May 2016 at 15:24 GMT Nanette (Gahn) Pezzutti wrote:

I have started putting a public comment for managers of profiles when the profile has a future birth date. Started at the bottom of the latest list, working up the stack. -NGP

On 11 May 2016 at 21:10 GMT Jillaine Smith wrote:

I'm going through this one:

http://www.softdata.si/osebe_staro/ales/wikitree/Err_20160501/902_0000-0000.htm

And working on finding profiles to merge dead profiles with, or communicating with the profile managers (if there are any) requesting more details be added.

On 11 May 2016 at 20:37 GMT Jan Terink wrote:

Members of the Dutch Roots project are busy trying to fixi all errors (3000+) by province location (11 relevant provinces)..

On 11 May 2016 at 19:22 GMT Carol (Winton) Keeling wrote:

Section 106 looks very interesting for us Connectors. Will start to look at the most recent slot of dates 1990-1999, if that's OK.

106: 1990-1999 In progress

On 11 May 2016 at 06:37 GMT Aleš Trtnik wrote:

I will post in G2G each month as data is updated. In G2G are also discussions about errors with link to here. Today I noticed News on Blog about this. For automated message you need to ask leaders, but I think such actions are not prefered.

On 11 May 2016 at 02:22 GMT Paul Gierszewski wrote:

Given the number of possible issues identified (about 500,000), making real progress will depend on getting a lot of wikitreers engaged. Advertising the personal-tree-checker-tool on G2G is one way as was recently done, and it probably should be repeated periodically. Another is to send an automated message to all the profiles identified here with a description of the potential issue. Is that a possibility?

On 10 May 2016 at 12:17 GMT Aleš Trtnik wrote:

Transfer of work messages:

Esmé van der Westhuizen:

105 - 1700-1799 Completed merge proposals, that was not done by other members

105 - 1800-1899 Working

Nan Lambert:

105 - 1500-1699.

Paula Dea:

102 - 1700-1799 done (1),

102 - 2000+ message sent (5),

102 - 0000-0000 corrected, removed or messaged (22),

104 - 2000-now - message sent (4)

406 - both the 1's corrected

301 - Richard McClure has done the first 3/4 of these

509 - 1800-1899 - working on first check through for obvious genders

Me

203: 1700-1799 Errors with male name corrected

204: 1700-1799 Most of errors corrected

303: 1700-1799 Errors with female name corrected

304: 1700-1799 Most of errors corrected

402: 1700-1799 Most of errors corrected

403: 1700-1799 Most of errors corrected