Algorithm 2.0 is now live!

This is the biggest change to the algo yet!

Nov 19, 2020

Today we launched a new version of our algorithm. It is the biggest change to the algorithm yet. Many scores and ranks have changed. We believe that the new scores better reflect reality.

You can see whose ranks changed in this document or just visit hive.one.

We also removed two clusters from the website (Crypto and Ripple). More on this later.

How does the algorithm work

We calculate the influence scores looking at the Twitter following graph. Every time a member of a given cluster follows you, it adds to your score. The more influential your new follower is and the fewer other accounts it follows, the bigger the boost to your score.

It’s the same core idea that Google used to rank websites. You can read more here.

Why did we change the algorithm

The previous version had many shortcomings:

It couldn’t pick up “cluster splits”. For example when BCH broke off from BTC and formed a new cluster. We had to correct it by hand.
It took long to find out that an account got “kicked out” or when a new account joined a cluster. For example, it took weeks to add Michael Saylor.
It required manual tuning. For example, we had to design a "Celebrity Filter". Otherwise @elonmusk, @realdonaldtrump and @barackobama would show up in almost every cluster. This filter removed accounts that had a huge number of followers, but a low influence score. This filter used an arbitrary picked value and sometimes made mistakes.
Mapping of a cluster was not automated. It also required making minor human choices in the process.
The accuracy deteriorated over time. The algorithm could not self-correct as the underlying structure of the cluster changed. It was because the mapping required a semi-manual process. This means that we either had to “re-map” a cluster on a regular basis or let accuracy deteriorate over time.
It could not scale. The previous version worked well as a prototype. It enabled us to prove the hypothesis, but it only enabled us to map specific clusters that we knew that existed.

How is the new version different

It can map clusters without human oversight and pick up splits and mergers of clusters. We can now point the algorithm to a small group of accounts and from there it will find the cluster they belong to.
It is much faster when it comes to identifying changes in the cluster. For example, it added Michael Saylor right after his first podcast appearance.
We designed it in a more principled way. We managed to replace the “Celebrity Filter”. We did that with a tuned thresholding function. The model still requires a small number of external parameters. We are confident that we will be able to further reduce the need for manual tuning in the next releases.
Self-correcting mechanisms. The algorithm can now identify changes in the underlying structure of a cluster. It then adjusts automatically. It's possible because we were able to automate the mapping process. This means that the scores should maintain a stable level of accuracy over time.
Scalability. The algorithm can now scale “up and down”. We can map sub-clusters within each cluster as well as the super-cluster it belongs to. This means that given access & resources we could index the whole Twitter.

All these updates of course affect the scores our algorithm produces. The pivotal change is the underlying definition of clusters. The previous version of the algorithm relied on a semi-manual selection of accounts. The updated algorithm maps out clusters automatically. As a consequence, the clusters and also the derived influence scores have changed. In some cases by a lot. We believe that the new process is more principled and more accurate. The landscape of Crypto Twitter has changed since we mapped the last time. This means that this selection must have been somewhat outdated. The current definition of clusters paints a more accurate picture.

What’s coming next

This by no means an end to the improvements. This new version lays a foundation for future releases. In the next months we are planning to:

Further reduce the need for arbitrary choices in the algorithm.
Open up the algorithm to our users. You will be able to choose which clusters you want mapped. Until then, we decided to remove the Ripple cluster, since it wasn't used much.
Enable mixing clusters together. For example, the Crypto cluster was such a mix (we mashed together Bitcoin, Ethereum, Bitcoin Cash and Ripple). We removed it for now, but we will bring it back once we add this functionality. You will also be able to create your own mashups of clusters.
Go beyond “following graph”. We will add more dimensions to our algorithm. This means, that we will calculate your score *not* only based on who follows you.

And much more. If you want to know about the next updates follow us on Twitter and Substack.

hive.one Newsletter

Discussion about this post