There's always a bit of guesswork involved in genealogy. Say my ancestor is named Mary Quigglesworth. I can find a record for a John. H. Quigglesworth, showing that he had a daughter, Mary Quigglesworth. How do I know for certain that it's the same Mary in both cases? At best, we're creating a hypothesis. And, as with all hypotheses, it's the best theory available until we can find a fact which disproves it.
In a more real instance, I have a line on WikiTree that supposedly goes back to Charlemagne. While I know mathematically that I'm very likely to be descended from Charlemagne (as virtually everyone in Western Europe would be!), genealogically establishing an actual series of connections over some 1200 years to when he actually lived is a stretch for 4 reasons that I know of:
1. Lack of Records
The further back you go, the fewer records per generation are available, making genealogy very spotty and largely limited to nobles and those in power. Some countries do have extensive records going back into the 1500s, however records are easily lost, damaged, or destroyed, so often very little survives. When going that far back, with fewer records we cannot definitively establish, with the same certainty one might have for contemporary persons, that there was only one John. H. Quigglesworth and only one Mary Quigglesworth living at a particular time and place, allowing for reasonable assignment of records to profiles.
2. WikiTree (and most genealogy) includes uncertain data
In many lineages on WikiTree, there is usually at least one kind of questionable or perhaps "speculative" connection along the way. In my own situation, there's a woman in Quebec who gets linked as the daughter of some minor noble in France. It's really tenuous. Everything before is good - Quebec itself has excellent records going back to the 1600s. And the French side is also very good. But it's that one tenuous connection that probably undoes the whole thing.
I would add that it's good and okay to keep such tenuous connects, provided that we mark them as such. This helps organize information, guide research, and formulate hypotheses, even if it later disproves the connection.
There's some info on a help page about this: https://www.wikitree.com/wiki/Help:Uncertain
(I'd add that I came across an interesting example of how someone, back in 1997, was able to support a speculative connection because of data about the relatives of the couple. Speculative connections are a good way to enable that kind of checking. See Peloquin-204 if you're curious.)
3. Probability has a cumulative effect
Even if we're right, we're probably (eventually) wrong. At some point, our cumulative probability of being correct that a particular person, documented to be our ancestor, is actually our ancestor, drops below 50%. Let's say that our parent child connections are, on average, 95% correct. When does this happen? At 13.5 generations.
0.95N = 0.5
N = ln(0.5)/ln(0.95) = 13.5 generations
However, that's for a single line of ancestors. At generation N, we have up to 2N ancestors (although going that far back, we usually don't have all documented, and some pedigrees to collapse). If we are 95% right on every path, at N = 3, we would be 85% correct, but 15% incorrect for each of 8 great grandparents.
Afalse = (1 - PtrueN)(2N)
Afalse = 0.15*8 = 1.141
which means that, on average, 1 out of 8 great grandparents is incorrect, provided that we are 95% correct in every parent-child relationship that we have on WikiTree. Using a bit of math, we can obtain the minimum average probability, across all parent-child relationships, in order that only 1 ancestor in a given generation is expected to be incorrect or a non-biological ancestor (it could be more or fewer).
As you can see, by generation 10, we need to have an average probability of 0.9999, that is 99.99% of all parent-child relationships are true, biological parent-child relationships, in order for only 1 ancestor in generation 10 to be incorrect, on average.
And that's just taking our assignments based on the paper trail and testimony of human beings into account. That brings us to the 4th reason.
4. Paper records don't always tell the biological truth.
Even if our assignments were perfect with respect to establish records, there is still cuckoldry or extra-pair paternity events where the father isn't who the documents claim him to be. Luckily that's only about 1% (but ranges from 0.4% to 5.9%)
If 1% of father-child relationships are incorrect, we drop down to 99.5% probability. That would mean on a family tree with 5 generations, one of those 2nd great grandparents is probably not your ancestor according to the math that we just examined.
Conclusion
So when talking about ancestry based on WikiTree, I tend to be cautious in how I discuss it, and honestly should be more cautious than I am. I might say, "my family tree has a line that goes back to _____", rather than the more definitive statement, "I am descended from ______". It's very reasonable to say, "Hey, I have some evidence", rather than making a straight up claim of being a descendant of someone, especially when getting to around 10 generations (or more) back.
Especially we go that far back, we need to treat the whole ensemble of genealogical information as speculative, even if we are extremely confident about all of the individual relationships in the chain. Even with high levels of accuracy, because family trees tend to grow in size exponentially, there will be many errors. And the further back we go, the more room for error.