Origin
Datamule grew out of an open source document parsing project started by John Friedman, during a medical leave of absence from his PhD at UCLA. The document parser required a large supply of diverse data, for which the SEC corpus was chosen. Datamule was then released as an open source package for working with SEC data.
SEC rate limits were too slow, so John set up his own SEC Archive. This was released to the public, using a Stripe paywall of $1/100,000 filings downloaded as a resource control mechanism.
In April 2025, Datamule was incorporated as a LLC. That summer, Datamule became part of AWS Activate and Cloudflare for Startups. The credits and compute were used to setup cloud infrastructure to process and distribute SEC data at scale.
Mission
At its core, Datamule is about efficient processing of information. This is an interdisciplinary mission that blends together concepts from Information Theory, Machine Learning, Compression, and Distributed Computing.
The SEC corpus provides a rich, diverse, large corpus to work on that is also extremely valuable.
Goals
- Make SEC data cheap and easy to use, especially for AI.
- Advance information processing.
- Profit.
Team