News on Database errors project (23 October 2016)

+9 votes
338 views

Analysis was done on data from October 23rd 2016.

in The Tree House by Aleš Trtnik G2G6 Pilot (809k points)
retagged by Maggie N.

Yahoo ! .. 1783 profiles on my privilege to serve watch list and ZERO d b errors Finally

C'est Bon Magnifique ! 

 

 P.S. The gedcom import night mares are over ! .. lol 

 

 

Congratulations. I will have to make new errors, to keep you busy :-).

Of Course ! ..   

Any other effort would not be the WikiTree sterling Monsieur

  Aleš Trtnik 's  solar system's d_b physician gravity wave progress .. lol  

 

P.S. I'm Busy I'm Busy I'm Busy  I'm Busy 

C'est Bon 

5 Answers

+5 votes

Updated 6x1 Wrong word in x location

I added Unicode errors for few Norwegian places (H2land, S2r, S2rum, Str2m, Tr2gstad), also added Age, Aged, Alive, HTTP and HTTPS as forbidden words. Unknown is also still checked with spelling variations. If you notice any other wrong words in Location field, let me know.

Errors Total 0000-0000 0001-1499 1500-1699 1700-1799 1800-1899 1900-1999 Open New
601 Wrong word in birth location 8011 923 3 171 1248 4470 1196 6083 809
631 Wrong word in death location 20617 807 2 211 3356 13217 3024 17621 7367
661 Wrong word in marriage location 1433 45   45 309 770 264 1134 273
by Aleš Trtnik G2G6 Pilot (809k points)
+5 votes

Updated 6x5 Number in x location

I corrected algorithm to ignore some separators so also dates (01/02/2000, 01-10-2011) are added to error.

Errors Total 0000-0000 0001-1499 1500-1699 1700-1799 1800-1899 1900-1999 Open New
605 Number in birth location 598 364   2 11 196 25 61 68
635 Number in death location 361 75   4 6 235 41 32 30
665 Number in marriage location 110 52   1 4 41 12 14 98
by Aleš Trtnik G2G6 Pilot (809k points)
+5 votes

Updated 104 Too old

I changed maximum allowed age from 115 to 110 years. I also automatically exclude profiles, that are in Centenarians or Supercentenarians category. So don't use False error. Add appropriate category. I will probably remove False error in the future. I will also lover the age to 100 in the future.

Errors Total 0001-1499 1500-1699 1700-1799 1800-1899 1900-1999 2000-Now Open New
104 Too old 8043 294 1417 2640 3540 148 4 7384 2547
by Aleš Trtnik G2G6 Pilot (809k points)
+4 votes

New 6x8 Misspelled country

I am checking for spelling of list of known countries. Country is entered as last word or words in location field. I prepared a list back in april, to identify location to a country. There are local and english version of a country. I added also some common variations like USA, United stated, United states of America, I also added some cities or regions without the country like Amsterdam, Derbyshire, Connecticut. When correcting those, you can of course also add the country. Some disambiguous misspelling will have two errors Like austraia could be Australia or Austria. List contains cca 450 names. If you find any errors in report, let me know. I expect there are some. I didn't expect that many errors, but here they are.

Errors Total 0000-0000 0001-1499 1500-1699 1700-1799 1800-1899 1900-1999 2000-Now Open New
608 Misspelled country in birth location 28543 603 184 1703 5426 17245 3381 1 24220 28543
638 Misspelled country in death location 17746 524 118 1015 3189 10049 2851   14949 17746
668 Misspelled country in marriage location 5474 177 20 345 1364 3123 445   4749 5474
by Aleš Trtnik G2G6 Pilot (809k points)

When I went back far enough in time, I started seeing many "Misspelled country in birth location" that suggest that "New Netherland" is a misspelling that needs to be changed to "New Netherlands." Some of these are profiles that used to say "New Netherlands," but got corrected several weeks ago! Please fix it so that "New Netherland" will be  accepted as a spelling again.

True. I added New Netherland to the country list. List will be updated in a few hours.
Thanks.

I found another issue. When the error checker found the location "Ma, usa", it recommended changing it to "GA, USA." That's terribly wrong. "Ma." (MA in the modern postal system) is an abbreviation for Massachusetts and Ga. (GA in the modern postal system) is an abbreviation for Georgia. Very different places!

Furthermore, these two-letter codes shouldn't be recommended for continued use."Ma, USA" should become "Massachusetts, USA" and "Ga, USA" should become "Georgia, USA" (assuming that the dates are after the USA existed -- the profile where I saw the "Ma, usa" was for a date around 1700, before there was a USA).

PS - Are any other two-letter codes being "corrected" in this fashion? I hope we're not also encouraging people to change Pa (Pennsylvania) or Va (for Virginia) to Ga for Georgia, etc., or we'll be creating a new mess...

And yet another issue. At least this one is very funny.

The death data in the profile is "1755-12-00 Upper Smithfield, Northampton Co., PA, Killed by Indian." Obviously, "Killed by Indian" shouldn't be in the death location, but the error checker has other ideas. it reports two instances of error 638 Misspelled country in death location. The first report says "indiaN -> India" and the second one says "indian -> Indiana". Can the location "Indian" be flagged as an error without telling people what place name to use instead?

Only countries with 4 or more letters are checked. The country here is  "Ma usa". I added that as a country to increase country recognition. I will remove such cases for spelling.

Well if someone will correct that we will have location "Killed by India" or location "Killed by Indiana". No real damage will be done here.

I fixed that profile with "Killed by Indian" in the location field.

I'm thinking that "Indian" probably occurs as a typo in additional location fields, where people will be presented with a dual error. Hopefully, they will laugh and make the appropriate correction.
6x8 error is checking for country misspelling. I presume country word or phrase is at the end of the location field. So not all indians are checked. only the ones at the end of location field.
This is going back to what you said earlier that you would probably eliminate false error.  My ancestors tend to have some unusual names. I get the possible misspelled name message a lot. In the South we also have people who have the first name Doctor or Major and then you get the error of having a prefix in the name field. I assume you would have an alternative to the false error option.
I think he's saying he'd like to eliminate  it for place names. We'll always need it for people's names.
Thank you!! I see what you mean!

I actually said it for error 104 Too old, since there is an alternative to mark it ok.

+3 votes

New Error 831 831 Multiple duplicated lines

This error is added if whole line is repeated in biography. Line must be at least 40 letters long to be checked. Number of all repeated lines must be more than 10 to be an error. There are also some texts I excluded from checking (Lines beginning with :: Place:, :: Relationship to, Detail:(No detail, and Citation provides evidence for). I can add more if you think it is needed. In future we might make this rules more strict.

Errors Total 0000-0000 0001-1499 1500-1699 1700-1799 1800-1899 1900-1999 2000-Now Open New
831 Multiple duplicated lines 96210 2839 2191 6384 10773 61499 12516 8 63748 96210
by Aleš Trtnik G2G6 Pilot (809k points)
Gee 96 210 new errors.... feels its soon easier that we start from scratch or start cleaning using software bots.... ;-)

nah... these aren't new errors... they mostly all overlap with "error 811 (Uncleaned profile after merge)"

 

I was thinking to exclude this error if 811 exists for the profile. But it is nice to see which lines are repeated. I will think about this.

I excluded profiles with error 811.

Errors Total 0000-0000 0001-1499 1500-1699 1700-1799 1800-1899 1900-1999 Open New
831 Multiple duplicated lines 72235 1354 559 2969 7000 50707 9646 47561 72235

Related questions

+10 votes
4 answers
518 views asked Oct 31, 2016 in The Tree House by Aleš Trtnik G2G6 Pilot (809k points)
+13 votes
4 answers
464 views asked Oct 17, 2016 in The Tree House by Aleš Trtnik G2G6 Pilot (809k points)
+12 votes
5 answers
368 views asked Oct 11, 2016 in The Tree House by Aleš Trtnik G2G6 Pilot (809k points)
+20 votes
4 answers
550 views asked Oct 3, 2016 in The Tree House by Aleš Trtnik G2G6 Pilot (809k points)
+14 votes
3 answers
455 views asked Dec 27, 2016 in The Tree House by Aleš Trtnik G2G6 Pilot (809k points)
+15 votes
1 answer
280 views asked Dec 20, 2016 in The Tree House by Aleš Trtnik G2G6 Pilot (809k points)
+15 votes
3 answers
314 views asked Dec 13, 2016 in The Tree House by Aleš Trtnik G2G6 Pilot (809k points)
+22 votes
5 answers
626 views asked Dec 6, 2016 in The Tree House by Aleš Trtnik G2G6 Pilot (809k points)
+17 votes
1 answer
251 views asked Nov 29, 2016 in The Tree House by Aleš Trtnik G2G6 Pilot (809k points)
+14 votes
1 answer
320 views asked Nov 22, 2016 in The Tree House by Aleš Trtnik G2G6 Pilot (809k points)

WikiTree  ~  About  ~  Help Help  ~  Search Person Search  ~  Surname:

disclaimer - terms - copyright

...