Database Error Project: Is size an error?

+11 votes

A candidate for a new error in the Database Error Project

If we have a look at profiles sorted by size I feel we find profiles that need a caring hand...

Do you agree?


in WikiTree Tech by Living Sälgö G2G6 Pilot (290k points)
retagged by Dorothy Barry
Would it be possible to add the privacy to the list?
I meant the list of profiles sorted by size.  I don't fuss about the private ones.

Very short profiles would also be an indication of work to be done.

Unfortunately this results in numerous zero byte pages...

Is there an argument to control minimum size in the output of the Special:Shortpages function?

Good point Jan Terink there is a Wikipedia standard command. Don't know how good it works in WikiTree

Offset: 10 000 Special:Shortpages&limit=500&offset=10000
Offset: 200 000 Special:Shortpages&limit=500&offset=200000
Offset: 300 000 Special:Shortpages&limit=500&offset=300000
Offset: 400 000 Special:Shortpages&limit=500&offset=400000
Offset: 500 000 Special:Shortpages&limit=500&offset=500000
Offset: 600 000 Special:Shortpages&limit=500&offset=600000
Offset: 1 000 000 Special:Shortpages&limit=500&offset=1000000
Offset: 1 500 000 Special:Shortpages&limit=500&offset=1500000
Offset: 2 000 000 Special:Shortpages&limit=500&offset=2000000
Offset: 5 000 000 Special:Shortpages&limit=500&offset=5000000
Offset: 10 000 000 Special:Shortpages&limit=500&offset=10000000
Offset: 20 000 000 Special:Shortpages&limit=500&offset=20000000

I have a feeling that its not many people out of them the 336 720 registered users that have created profiles....

Maybe define errors like

  1. Less than 500 bytes
  2. Last action imported Ged ==> The edit history needs t be in the dump or the last change with a comment

A profile like Barrett-2779 has size 368 bytes and should need more sources and care plus a profile manager

Ps. I also would like to see the DNA info in the Dump ==> then a GIS specialist like Aleš Trtnik or someone else could start creating a map interface that in a location xxx we have 10 people living between 1700-1750 who has a connection to a registered user on WikiTree that have taken a Family Finder DNA test and is on GedMatch..... 

Just a couple of results:

Over 5,000,000 profiles with size < 498

Over 5,625,000 profiles with size < 566

Over 6,250,000 profiles with size < 631

Over 7,500,000 profiles with size < 817

Over 10,000,000 profiles with size < 1743

Over 11,000,000 profiles with size < 4040

I think it can be concluded that over 50% of the profiles is unsourced, or insufficiently sourced.

3 Answers

+6 votes
I don't think I could see size as an error - especially if contributors don't know what size is acceptable.

Is there a case for there being a maximum size? I don't know.

Just to add - I've looked at one of the examples and it seems to suffer from numerous irrelevant repetitions. Doesn't that need to be solved in another way?
by anonymous G2G6 Pilot (270k points)


Please check the profile in the link its an indication that there is some WikiTree work to do is my firm belief....the ones I have checked is most Gedcom import that are not cleaned...

Its in the same way no error to get married twice in the same day with a wife with the same name as the first wife ;-)

Readable prose size What to do (they say in Wikipedia)
> 100 kB Almost certainly should be divided
> 60 kB Probably should be divided (although the scope of a topic can sometimes justify the added reading material)
> 50 kB May need to be divided (likelihood goes up with size)
< 40 kB Length alone does not justify division
< 1 kB If an article or list has remained this size for over a couple of months, consider combining it with a related page. Alternatively, the article could be expanded, see Wikipedia:Stub.
I do see what you mean, but do we need to differentiate between articles and profiles?

Taking Selman-116, frankly I think much of the content is irrelevant in that it is only meaningful to the owner. I could expand a lot of my profiles in the same way, but I can't see the point.

Could you clarify what you mean by divided?

A) do we need to differentiate between articles and profiles?

Answer: No but I compared WikiTree profiles and WIkimedia articles... 

B) Taking Selman-116, frankly I think much of the content is irrelevant in that it is only meaningful to the owner. 

Answer: Agree but we are all in genealogy of different reason...

C) Could you clarify what you mean by divided?
Answer: It was just a quote from WIkipedia....

My point is just a warning when a profile is big.... the one I have looked at had repeating citations that are errors etc...

If the profile is Ok ==> just flag it as a false error  

OK. Fair enough.
+3 votes
The same surnames repeat a lot.

The public ones I've looked at contain very large dumps of hundreds or thousands of people from My Heritage.  Possibly people downloaded a lot more people than they intended.
by Living Horace G2G6 Pilot (612k points)
+8 votes
Sometimes gedcom imports go bad, resulting in huge amounts of repetitious information like 11,000 links to the same ancestry tree. There is a list of these, that leaders have a link to. I clean a few up occasionally.
by Anne B G2G Astronaut (1.3m points)

Related questions

+10 votes
2 answers
+13 votes
1 answer
320 views asked Jul 24, 2016 in The Tree House by Robin Lee G2G6 Pilot (836k points)
+10 votes
2 answers
280 views asked Jun 25, 2016 in WikiTree Tech by Paula Dea G2G6 Mach 8 (87.7k points)
+9 votes
4 answers
+6 votes
2 answers
188 views asked Jun 6, 2023 in WikiTree Tech by Karyn Homburg G2G6 Mach 2 (24.8k points)
+3 votes
1 answer
198 views asked Aug 25, 2021 in Policy and Style by Eileen Bradley G2G6 Mach 3 (31.4k points)
+7 votes
0 answers

WikiTree  ~  About  ~  Help Help  ~  Search Person Search  ~  Surname:

disclaimer - terms - copyright