Numerai and Data DAOs

2023-06-06

web3 has pushed forward primitives for a more decentralized internet, especially when it comes to how we work with and trust data.

But the traditional monetization strategies for data-centric consumer services tend to be at odds with the ideals of decentralization, interoperability, and transparency that web3 holds dearly. Data that is neither gated nor uncensored (decentralized) and transferable across ecosystems (interoperable) doesn’t enable the aggressive platform lock-in and highly profitable ad targeting that consumer service big tech has profited from for the past decade+.

Several web2 social companies (most notably Twitter) started with open data ecosystems that - if you squint - weren’t far off from the decentralized social protocols pitched today, but eventually fell back to restricting access of that data to maximize near term revenue. Is this cycle of new consumer products turning into monolithic data siloes with problematic incentives going to indefinitely repeat itself? web3’s new data primitives may unlock an ecosystem that breaks this pattern.

Data DAOs

Eric Schmidt once called the internet “the largest experiment in anarchy that we have ever had.” The truth in this statement enabled Google to build an empire on search, and helps explain why DAOs are so powerful as a coordination tool.

A growing number of DAOs are iterating towards better methods and tools for online coordination, but DAOs building data assets (prediction models, data lakes, and everything in between) are showing perhaps the most progress of all and are poised to accomplish goals that would be near impossible without the scaled coordination DAOs enable. Those organizations, which I believe warrant their own moniker as ‘data DAOs’, are already playing a role in growing the ecosystem of sustainable, permissionless data stores.

I broadly define data DAOs as DAOs creating and maintaining a unique repository of data assets for the direct benefit of their DAO. Those assets can take many different forms, but tend to be data, machine learning techniques, and tools enabling novel utility on top of the DAO’s data. These ecosystems incentive contributions of data assets (ie rewards for downstream consumption, grants for building tools members of the DAO need) and leverage permissionless payout mechanics to scale past the point open source communities struggle. These organizations are able to leverage scale to build ecosystems that are greater than the sum of their parts.

##Numerai and the Value of Data

Numerai is one such example of a group contributing to the growing ecosystem of data DAOs despite (or perhaps because of?) the regulatory hurdles faced in financial services. They neither function as a full-fledged DAO nor are truly open, but nonetheless show potential for new business models around data with decentralized primitives.

The group is focused on creating a decentralized hedge fund by incentivizing a community of data scientists to contribute financial datasets, create prediction models, add curation signals, and build tools (ranging from backtesting to cheaply deploying one’s model in the cloud). Permissionless payout functions award members commensurate to the value their strategy added to Numerai’s portfolio. As the community has grown, Numerai has been able to incentivize external data contributions and its own tooling to create a gigantic financial dataset with 1586 features. That dataset is obfuscated (securities and features are given arbitrary names, to start) enabling Numerai to give this licensed data away for free, but also to ensure only Numerai can decode and make use of these models.

To understand why Numerai’s aggregation play is interesting, it’s important to consider why data has ‘value’. Value for data (as proxied by the willingness of a consumer to pay) can often be broken down into just three features: completeness, correctness, and utility

  • data’s value can change based on utility - consider a complete and correct dataset of winning lottery ticket numbers for the next two weeks vs one for the past two weeks
  • data’s value can change based on correctness - an accurate dataset of consumer credit card purchases can power a forecasting model, while an inaccurate dataset will create a less accurate model
  • data’s value can change based on completeness - complete datasets enable strategies that incomplete datasets don’t, such as Crunchbase becoming the de facto source for many business data questions

Numerai has added utility by providing tools to extract more insights, improved completeness by aggregating its community’s contributions, and improved data correctness with a curation market on financial data streams. Data scientists working with financial want to work with the most complete, correct, and useful data and tools to maximize their edge and thus possible return, and Numerai is establishing a brand as the de facto platform for them.

This flywheel has enabled Numerai to truly scale the wisdom of the ‘citizen financial data scientists’ crowd. With 5,494 staked models at time of writing, they are starting to massively outperform traditional hedge funds.

##Data DAOs and Moats

Is a different group (like a normal hedge fund) going to offer data scientists opportunities to contribute strategies in a closed but more valuable data ecosystem? And are truly open DAOs like SingularityDAO going to steal Numerai’s members and foster an even bigger community?

The success of every group in this space of ‘decentralized investment funds’ has been predicated on maximizing the number of active users building successful strategies. Every player has two primary levers:

how users are rewarded for contributions (ie successful strategies);

the data ecosystem users have to build successful strategies

If the data ecosystem isn’t good enough, rewards for contributions won’t be significant enough to retain users. If rewards aren’t significant enough, users are unlikely to contribute their data assets and continue to grow the ecosystem. Bootstrapping that data ecosystem and appropriately parameterizing the reward function thus becomes quite important.

On the first lever, Numerai’s permissionless and transparent payout mechanics instill confidence in contributors that centralized payout practices generally can’t. It helps that Numerai doesn’t own or even get access to the code of a user’s model - only the specific predictions are submitted. The fair (albeit a loaded word) distribution of the industry-leading fund returns makes Numerai a compelling place to commit time towards developing said models, and requires more open players like SingularityDAO to show a serious pitch on how they can get reach similar or better returns.

On the second lever, Numerai’s data ecosystem is purportedly larger and better than any other dataset accessible for someone not employed at a hedge fund. It has social proof that other models have seen great returns and large payouts for other competitive strategies. Giving users interoperability for their models (by not encoding the dataset users train their models on) could be a key point of differentiation for a more open player like SingularityDAO, but this requires using open data (that should inherently have limited alpha) and facing a litany of legal issues for whatever licensed data is included.

Numerai is not only providing users the rewards and data ecosystem its contributors need to thrive and retain, but also incentives for said contributors to publish data assets that improve the ecosystem and propel the community growth flywheel. It’s Metcalf’s Law applied to a graph of data scientists building a data ecosystem rather than a social graph, as is standard.

Numerai’s model is not free of flaws. Users have raised complaints around the currency risk of NMR. The platform lock-in strategy of encrypting the data users train models on still feels like the same dirty trick big tech has played many times, even if warranted by the implications for legal issues and reduced fund returns. But while web2 consumer data companies leveraged bundles and walled gardens to form attractive unit economics, reduce churn, and increase platform locking in a model constantly at odds with the best interests of users, groups like Numerai have found an incentive alignment with their users/contributors to achieve an exceptional common goal, and fairly share the value. Other opportunities will follow, and just how open and interoperable these data layers and goals can be stretched remains to be seen.