Datamule

Making it cheap and easy to clean and enrich data at scale.

We write custom algorithms and use classical machine learning running on surplus GPUs to complete tasks typically done with LLMs at a fraction of the cost. LLMs don't scale well over billions of pages. We do.

Much of our code is open source. Click here to learn more about our team.

September Update

I'm now working on Datamule Cloud V2.

Last Month

  • Onboarded startups, researchers and individuals into the open-source ecosystem that I've created.
  • Set up Structured Output, an open-source repository to create alternatives to expensive datasets with strict licenses, using LLM's structured outputs.
  • Figured out how to cheaply scale certain NLP offerings, using multi-stage pipelines running classical ML on surplus GPUs.
  • Met with companies to discuss enterprise contracts.
  • Started writing a lot more articles.

Note: I've priced the cloud offerings to be fairly cheap. For example, downloading every 10-K since 1995 should cost about $2. If cost is an issue for you, or you want to chat, my email is [email protected].