Can we automate the removal of manually created "Unconnected" categorizations? [closed]

+29 votes
429 views

Hi All,

We have automated processes identifying profiles that are unconnected.  I think this means that it's no longer necessary to manually categorize profiles as "Unconnected".  Indeed this is currently leading to double counting in some of the reports we use for maintaining profiles.  So, for example, Downer-73 appears twice in the report for the county of Sussex in England (https://www.wikitree.com/wiki/Automated:DD_Unconnected_List_ENG_SSX ), once under "Category:Sussex, Unconnected Profiles" and then again under "Location: Sussex, Unconnected tree".  What's more, the manual categorization can easily be out of date, so Penfold-122 shows as unconnected when in fact it is now part of the world tree.

I would like to propose that we stop using the manually maintained "Unconnected" categorizations and allow the Editbot to remove those currently in place.  The ever helpful Aleš Trtnik informs me that this is possible and is something he would advocate.

Regards,

Chris.

asked in Policy and Style by Chris Weston G2G6 (8.5k points)
closed by Aleš Trtnik
Could you please add the 'connectors' tag to this post so it comes to the attention of the various connector teams?
I strongly agree, let's get rid of them all.
Ales makes a good point.

We have to keep the Unconnected Notables or we wont have anyone to connect!!!!

Robynne, I don't get your comment. I am for removing all unconnected categories. There are many ways to get candidates for connection.

For instance, https://www.wikitree.com/wiki/Category:Unconnected_Notables_of_England can be genrated automatically.

All Unconnected notables in England

https://wikitree.sdms.si/default.htm?report=srch1&Query=unconnected+england+notables&MaxProfiles=5000&SortOrder=Default&PageSize=10

All Unconnected persons linked from wikidata in England

Or both in one list You have here 1300 unconnected notables. vs. 30 in the category.

https://wikitree.sdms.si/default.htm?report=srch1&Query=unconnected+england+isinwikidata+or+unconnected+england+notables&MaxProfiles=5000&SortOrder=Default&PageSize=10

Aleš, how do we find unlinked profiles in the reports?
Use unlinked instead of unconnected in search pattern.

9 Answers

+12 votes

I support this proposal. This would also help in the Category tree, as 266 categories could be removed.

We also have Special:Unconnected available that can help to identify all unconnected profiles, as well as those that we manage that are unconnected.

answered by Steven Harris G2G6 Pilot (170k points)
When I want to look for unconnected profiles, I still use Special:Unconnected instead of either the categories or Ales' database. Special: Unconnected gets updated every 24 hours without any special intervention, so I don't see a need for the other features.
+10 votes
I think this is a great idea. When automation is possible it can be quite helpful.
answered by Doug McCallum G2G6 Pilot (261k points)
+3 votes
This is an excellent proposal.
answered by Deb Durham G2G6 Pilot (725k points)
+7 votes
I absolutely agree with using EditBot to remove the Unconnected categories once a profile is connected.

I am much less happy with the idea of doing away with the unconnected categories entirely. I use them constantly in the course of my work, and, while the reports are great, they aren't quite as easy to work through.
answered by Greg Slade G2G6 Pilot (194k points)

To unpack my objections to removing the unconnected categories a little bit, there are several factors which go into my thinking:

First, the unconnected categories were created by connectors, for use by connectors, trying to make it easier to identify individuals who still need to be connected to the main tree. They're broken down geographically, because some people tend to work on connecting (or sourcing, or improving profiles in other ways) in on specific location, frequently because they have expertise and access to sources in that location that they don't have elsewhere.

As is the case with many categories, people who don't actually use those categories don't see any value in them. And, unfortunately, some people seem to have the attitude that anything they personally don't see any value in must be removed from WikiTree. (Take, for example, the links to Queen Victoria, the profile of the week, and the WikiTreer of the week. I've heard from any number of people who don't like them and think that they should be removed, and never mind all the other people who think they're fun and like seeing them. [Or the much smaller number of people who take the opportunity to test and improve all the links in the chain between their own profiles and the weekly profiles that come up. Personally, my hats are off to them for committing themselves to taking up such a huge and ever-changing task.])

So my plea to people who aren't connectors is: "If you don't use those categories, don't worry about them. Just ignore them. They're not hurting you, and other people need them."

Second, some people seem to take the presence of an unconnected category on a profile they manage as some kind of black mark against them. (Granted, many more people get upset about the presence of an unsourced category, but apparently, some people take it as some kind of insult.) I'm afraid I just don't understand the thinking behind that. 

The reason I joined WikiTree in the first place was because I saw A.J.'s TED Talk, and the thought of being able to find and follow links between me and the rest of the world was tremendously exciting. The months that it took before I actually managed to make a connection between my branch and the main tree were intensely frustrating for me, and I very nearly gave up. If the unconnected categories had existed back then, and somebody had come along and put them on profiles I manage, I would have been thrilled. To me, those categories are a flag saying, "Hey everybody, come and help these people get connected to the main tree." I would have loved it if a bunch of people had come along and helped to get my branch connected. Once I finally did find a connection to the main tree, I was thrilled, and every time I find a new patch connecting my branch to the main tree, I still get a charge out of it. I can't understand why anybody would not want help connecting to the main tree. (Or at least, anybody on WikiTree. There are plenty of sites, not to mention software to run on their own computers, for people who want to keep their sandboxes all to themselves.)

Third, while Chris pointed to the unconnected lists (like the Unconnected profiles for Sussex list), which Aleš was kind enough to create at the request of the Connectors Project, those aren't exactly the same thing. The unconnected categories should be applied to unconnected profiles where the person was born, lived in, or died in that location. The unconnected lists record branches, rather than individual profiles, and the linking profile is whatever profile in that branch which was added to WikiTree first, so while the presence of an entry in that list means (or at least probably means) that somebody in that branch has a connection with that location, it doesn't usually mean that the linking profile has a connection with that location. For that reason, while I use Aleš's lists to populate the Let others know what locations you are working on page, when it comes to working through the profiles in a particular location where I have access to sources so I can try to connect a branch, I use the categories, because if a profile has an unconnected category for a particular location on it, then I know that person spent time in the location and might show up in the sources. (Unless somebody applied the category by mistake.)

Granted, sometimes I find that a profile is already connected, in which case I remove the category and move on. It can be tedious going through a whole branch and removing the categories from every profile once you connect it. So I am fully supportive of Chris' idea of automatically removing those categories from connected profiles. I'm just not keen on having the categories themselves taken away from me.

I expected you will object to complete removal. And as active member of the connectors project, we will respect your wishes, although I don't understand why.

Can you ask others in Connectors project to participate in this thread. It concerns mostly connectors project.

How about this two groups? can they be merged with others for specific location?

That would include also Unlinked profiles

https://wikitree.sdms.si/function/WTCatNavigate/Category.htm?Category=Unlinked_Profiles&Levels=2

and Unconnected Notables

https://wikitree.sdms.si/function/WTCatNavigate/Category.htm?Category=Unconnected_Notables&Levels=2

I wish I could give your post an up vote!

Aleš,

Part of the reason that I like working in categories is because, if a profile has the wrong category applied, I can remove it (or replace it with a more appropriate category) as long as the profile is set to Open. But if a profile gets into a report, I don't know how to remove it from that report. And then, as I manage to connect (or source, or whatever that report is about) profiles from that report, then the remaining profiles in the report tent to be profiles that either don't belong in that report in the first place, or else that I can't fix.

But it may be that there are ways to deal with that, too:

  1. In the Connectors Project we don't want to connect profiles to the main tree if they are fraudulent. There are people who are working to identify and disconnect fraudulent genealogies, and we don't want to be undoing their work. So is it possible to keep profiles from showing up in the unconnected reports if those profiles have either {{Uncertain Existence}} or {{Disproven Existence Project}} on them? Better yet, is it possible to keep any branches which any any profiles with either of those two templates on them from showing up in the unconnected reports?
  2. Sometimes, when I look at entries in a geographic report, like, say, unconnected profiles for British Columbia, I don't see anything in the profile that says that that person ever lived in that place. (Presumably, there's something in the profile somewhere which is triggering the script in error.) Normally, I just close that profile and move on to the next one, but it would be nice to be able to flag that profile not to show up in that particular report anymore. Is that doable?
  3. For those part of the reports which point to individual profiles, rather than branches (Unconnected Profiles and Unconnected Orphans), is it possible to exclude profiles which are not set to Open from the reports?

I was reviewing unconnected reports and got the same idea. 

I just cant decide on exact rule to exclude the tree from the report. Presuming that if any profile in a tree has uncertain existence, doesn't necessary mean that whole tree must be unconnected. I am more inclined to a rule if all profiles are marked uncertain, then we should ignore that tree. I think all profiles in a branch should be marked as uncertain so that if someone tries to connect to any profile, he knows it is a possible fabrication.

While  writing these, I already decided, that all profiles must be uncertain.

Now how is the profile marked as uncertain?

Looking at one Goodman tree

https://wikitree.sdms.si/function/WTWebProfileSearch/Profiles.htm?Query=tree44&MaxProfiles=5000&pagesize=500

  • 474 profiles are connected to each other.
  • 450 have C:Goodman_Genealogy_Fabrications
  • 15 have C:Goodman_Genealogy_Fabrication
  • 411 have C:Disproven_Existence_Adjunct
  • 340 have T:QUESTIONABLE
  • 47 have T:DISPROVEN EXISTENCE
  • 77 have T:UNCERTAIN EXISTENCE

Looking at this, there is no single category that would be on all profiles. I suggest, that any category in Frauds and Fabrications defines a profile as not needing connection.http://wikitree.sdms.si/function/WTWebCategoryNavigate/Category.htm?category=Frauds%20and%20Fabrications

or should we go one level higher to https://www.wikitree.com/wiki/Category:Fictitious_and_Legendary_Genealogy 

https://wikitree.sdms.si/function/WTCatNavigate/Category.htm?Category=Fictitious%20and%20Legendary%20Genealogy&Levels=5

I think templates add one of this categories.

This would remove those profiles from the reports.

Now let me explain, what are the things in report.

Tree size   ↓ Trees   ↓
971 Cornish-1847 T S,
474 Goodman-1515 T S,

 

If we take Goodman for example. There is 474 profiles connected to this tree, Goodman-1515 is listed as name, but it is just one profile in that tree. The one that have lowest ID (was created first). it doesn't mean, that that profile is linked to the region. Clicking it just opens it on wikitree.
Letter T links to WikiTree+ and displays first 20 generations from Goodman-1515. It doesn't necessarily displays all profiles, but they are listed by generations.
Letter S displays all 474 profiles connected to this tree on WT+. They are grouped by tree44, and that number can change each week. there you will see also the profiles connected to the region. You can also use that ID in specific searches on WT+ for example https://wikitree.sdms.si/default.htm?report=srch1&Query=tree44+Vosges&MaxProfiles=500&SortOrder=Default&PageSize=10
Aleš

Of the Goodman profiles, those that are deemed to be bogus are being project boxed and will receive C:Disproven_Existence_Adjunct  

The profiles that were affected by the fraud but are legitimate or questionable will retain the Goodman category.

EVERY profile that is boxed and goes into the Adjunct category has been deemed to be bogus and can be removed from all reporting.

The same should be true for the parent project: Disproven Existence.  Profiles boxed {{Uncertain Existence}} will get the Category:Disproven_Existence and should also be excluded from all reporting.

No point correcting impossible dates of birth for a non-existent person or trying to connect fantasy folks to the real tree ;-)

I added exclusion of trees, where all profiles have any of this templates 

  • Disproven Existence Adjunct
  • Uncertain Existence
  • QUESTIONABLE
If you will notice after update any tree that should be excluded, let me know. 
We should have the also Disproven Existence template.

The adjunct is a sub project of Disproven Existence - each with their own template.
Added.

Aleš, I have another question about the reports:

Taking the unconnected profiles report for Canada from 1800-1899 for an example, would it be possible to make those reports sortable on the column headers? (WikiTree ID, name, birth date, death date, etc.) When I do a WikiTree search for a particular name, I usually sort by birthdate, and then often happen to notice two people (or, as happened yesterday, three people) with the same name, birthdate, and place of birth. In cases like that, I dig a little deeper, and if they really do look like duplicates, I suggest a merger and then go back to looking for whomever I had set out to find. I'm always finding things that aren't obvious at first glance by sorting in various ways, and I'm thinking that that kind of functionality could make those reports even more useful.

No.

This reports are huge. It has 40000 items. making that sortable on client side is problematic, since complete report would have to be on 1 page.

Po prepare the pages in different order would extend the preparation time for a few times and instead several hours, it would take over one day.

You have sorting possibility on WikiTree+
+2 votes
As a connector, I like it being done automatically ... and the elimination of double counting will make for more accurate reports.
answered by N. Gauthier G2G6 Mach 3 (37.4k points)
+3 votes

How about this two groups? can they be merged with others for specific location?

That would include also Unlinked profiles 

https://wikitree.sdms.si/function/WTCatNavigate/Category.htm?Category=Unlinked_Profiles&Levels=2 

and Unconnected Notables 

https://wikitree.sdms.si/function/WTCatNavigate/Category.htm?Category=Unconnected_Notables&Levels=2

I'm spinning out my answer to Aleš's question as a separate answer. It's a big enough topic by itself that it could probably even be a separate thread.

I the locations where I have been editing the Unconnected Notables categories, I link up from them to the Unconnected Profiles category for the same locations, and then up from from the Unconnected Profiles for that place to two places:

  1. the Unconnected Profiles category for the next-higher geographic category (from county/province/state/territory to country, from country to continent, and from continent to the main Unconnected Profiles category)
  2. the Maintenance Categories category for that location (which connects, in turn, to the location category itself, and also to the Maintenance Categories category for the next-higher geographic category)

Of course, I've only touched a small minority of location categories, so probably what I've been doing isn't true everywhere.

I don't recall ever seeing an Unlinked Profiles category in any of the locations where I've been working recently, but if I ever did, I figure I'd treat it like the Unconnected Notables categories, since they're both special cases of unconnected profiles.

Unlinked profiles are kind of the elephant in the room as far as unconnected profiles go. They already constitute the vast majority of unconnected profiles, and as we connect more and more larger branches, that situation is only going to get more extreme. I haven't seen anybody talking about the fact that we somehow have to identify, source, and connect all the millions of unlinked profiles out there. (Probably because it would be such tedious work, and, unlike connecting a branch of hundreds or thousands, you wouldn't be able to point to a big drop in the unconnected numbers after connecting one profile.)

I've thought of a few possible ways (besides the Lost and Found Project) to encourage people to tackle such a huge and thankless job:

  • A Link-A-Thon, in which teams compete to identify, source, and link unlinked profiles, even if only to one other profile. (Granted, a twig of two profiles isn't a lot better than a leaf of one profile, but it is better.)
  • A report listing profiles which are both unsourced and unlinked. If the people running the Saturday Sourcing Sprints would encourage people to work from that report, it might lead to profiles getting at least linked to other profiles or branches, or even connected to the main tree.
  • A report listing profiles which are both unlinked and have one of the Needs Profiles Created categories, which would at least make it easy to link some subset of the unlinked profiles.
answered by Greg Slade G2G6 Pilot (194k points)
Actually, Aleš, that brings up yet another question: can EditBot remove the  Unlinked categories from profiles which are, in fact, linked to at least one other profile?
+3 votes

Agreed, I use the Unconnected list NLD a lot, in combination with the WT+ option to search for the tree-number. The land/location is difficult, many of the top 50 unconnected trees have only one profile with a link to The Netherlands. I would still like to celebrate the connections made and see some progress made.I personally note the number unconnected reported every week and tally the trees connected during that week.

The Unconnected report is published on Tuesday (late) or Wednesday, could that be earlier or on a fixed moment?

answered by B. W. J. Molier G2G6 Mach 3 (36.9k points)
This list is among the latest reports created. It could be done sooner, but I try to create all suggestions reports first, since it is used by most of the users. I will check if I can optimize something, but it will not make much change.

About the publish time, If whatever goes wrong in the process, everything gets shifted, and I can't do much about it.
Hi Aleš,

I understand. I hope nothing goes wrong :D

Thanks for all the work!
+1 vote
Excellent idea! Just do it.
answered by Robert Hvitfeldt G2G6 Pilot (101k points)
+1 vote
I runned EditBOT to remove connected profiles from Unconnected categories. There was 400 of them. Last time bot was executed 4,5 months ago,

http://www.softdata.si/wt/EditBot/AutomatedUnconnectedLog%2020181219%20231322.htm
answered by Aleš Trtnik G2G6 Pilot (357k points)
I must have missed the deadline. I just connected a branch last night, and I was hoping that EditBot would remove the categories, but I must have done it after you ran the script.
It was doing it based on sunday's data, so you missed it for few days.

Related questions

+24 votes
1 answer
429 views asked Mar 1, 2016 in WikiTree Tech by Greg Slade G2G6 Pilot (194k points)
+12 votes
4 answers
+14 votes
5 answers
+29 votes
4 answers
+25 votes
2 answers
+2 votes
4 answers

WikiTree  ~  About  ~  Help Help  ~  Search Person Search  ~  Surname:

disclaimer - terms - copyright

...