Cloudflare for Startups Graduation
First off, I want to thank Melissa Kargiannakis and Christopher Rotas for helping me get into the program.
When I left my PhD to putz around open source stuff, I found myself writing document parsers for SEC data. SEC rate limits were too low, so I setup my own SEC SGML Archive using Wasabi S3, cached behind Cloudflare.
Wasabi was excellent. For $20/month, I was able to store terabytes of data without worrying about egress costs, which let me share the archive with researchers, startups, etc using only a nominal resource fee ($1/100,000 downloads) to prevent accidental cost overruns. I ended up turning a profit, which surprised me.
In June 2025, I transitioned from Wasabi to Cloudflare, thanks to the $5k in free credits. This was good timing, because shortly after usage spiked, and I'm pretty sure that I would've exceeded Wasabi's free egress threshold.
I transitioned my Wasabi S3 Archive to Cloudflare using a custom setup with CF queues and workers instead of using Super Slurper. I did this partly to reuse the cached SGML files so as to avoid triggering Wasabi's egress thresholds, but also because it was fun. It cost about $300 to migrate, and about $50/month in storage costs.
I then added a SEC Tar Archive, which was the SEC SGML files (which contain files within them such as xml, pdf, txt, etc) parsed and tarred together. This added another ~$50/month in storage costs, but let Datamule's users download just the data they needed -- e.g. 8-K EX-99.1, which was very fast. (Also, everything is zstd compressed)
More recently, I redid both archives as part of my move to Datamule Cloud V3. I changed the format from:
- bucket/accession.sgml (or .tar) to bucket/YYYY-MM-DD/accession.sgml (or .tar)
I did this because it allows better parallelization of list operations. I can now list every item in the buckets in a few minutes.
My CF credits were expiring, so I also decided to compress the objects more. This reduced the size of the archives by about 10%. This meant:
- My new costs were ~ $100/month
- Users could download some files a bit faster, and other files much faster.
While I update the archives in real time using an AWS lambda worker, for the backfill, I used Cloudflare. It cost ~$500, and tbh it was just awesome spinning up 500 containers to process the data. I had a lot of fun with that.
After all this, I still had about $2.5k credits left, and one week. Cloudflare is efficient.
I have some other responsiblities, so was not able to spend the remainder. My last attempt, was to open source some benchmarks for LLM evaluation. This...was rushed and isn't so good, but more importantly, also cost little. Cloudflare Workers AI is pretty cheap. See Benchmarks.
Some Criticisms
- Cloudflare R2 latency is not great. For my AWS lambda functions, much of their time is spent waiting for CF to respond. The CEO of Tigris reached out, and I may try them out in the future. I like the options they provide for object storage.
- I accidentally set application/type to zstd instead of octet stream. This caused CF to decompress my data before transmitting to user. This was annoying. I couldn't find a way around it, so spent ~$300 updating the metadata to replace application/type. Could be a skill issue on my part.
Thank You
Thank you very much to Cloudflare, and again to Melissa Kargiannakis and Christopher Rotas.