Yeah, without a doubt there are limitations with this analysis. That said, I do believe it provide strong evidence in an area that's not really been studied -- at least not that I've seen.
> assuming that all pages are perfectly optimized and equally linked
That's the biggest limitation of the analysis conducted.
1. We used the semantic similarity score of the page title to infer if the page was optimized
2. We also make an assumption that each blog post from a given company is equal (consistent) in terms of internal linkage and post quality. Not perfect. But it's evidence to support the claim and from what I've seen not too far off from reality.
The lack of any coherent pattern tells me either those assumptions are absolutely terrible (which you could argue) or more than likely KW Difficulty Scores are inherently bad metrics. I say more than likely because if you look at the signals that go into KW Difficulty scores it doesn't really pass any reasonable sniff test in terms of what you might expect from a predictive, forward looking metric.
> Lots of other factors IMO, really hard concept to prove/disprove with just 2 axes. (edited)
Doesn't take away from your point but it's 3 axes -- we are evaluating page title's relevance to the query as the 3rd axis. I thought the X/Y Graph with a color axis is an easier visualization of the data over a the 3-axis chart (see image below!)