Open Source Projects

datamule
Work with SEC data at scale
View on GitHub →
doc2dict
Convert documents (HTML, XML, PDF, etc) into dictionaries
View on GitHub →
datamule-data
Up to date data files for datamule using GitHub actions
View on GitHub →
datamule-indicators
Automatically updating indicators generated from SEC data
View on GitHub →
txt2dataset
Convert text into datasets
View on GitHub →
secsgml
Parse SEC SGML efficiently
View on GitHub →
secxbrl
Fast, lightweight parser designed for SEC InLine XBRL.
View on GitHub →
company-fundamentals
Standardize SEC XBRL into fundamentals data, such as EBITDA.
View on GitHub →
SecBrowser
A simple interface to interact with SEC filings.
View on GitHub →

Articles & Research

Datamule: Scaling data processing by starting with the SEC corpus.
Datamule distributed more data than the SEC did this year.
Capturing Human Readable File Conversion Latent Space
Turning documents into data at scale
Reverse Engineering how the SEC's EDGAR Works
Datamule Monetization Policy
Constructing subsidiaries data from SEC EX-21 html filing tables using a simple algorithm + minimal generative AI
2025 Year in Filings
EX-10 Material Contracts extracted from every SEC filing from 1993 to Jan 12th 2026
Creating datasets from html tables using algorithms instead of Generative AI
EDGAR Broken Link
Extracting Iran Disclosures from SEC filings and vectorizing them for semantic search
Download and apply sentiment analysis to every 10-K MDA in EDGAR
SEC filing wordcounts (1993-2000)
Making a Full Text Search for the SEC Corpus
500 filings downloaded per second using the new Tar Archive.
Programmatically downloading SEC attachments in bulk
[Bug] 40,284 filings labeled by the SEC as containing XBRL when they don't.
Actually deploying enterprise grade pipelines in 5 months and three weeks.
Possible Way to Eliminate R2 Class A Operations Cost
Learning how to deploy enterprise grade pipelines in 5 months.
Datamule Cloud V2
Extracting html tables from SEC filings
Prospective NLP API
SecBrowser
NLP on a Budget
How to (probably) get SEC filings seconds to minutes before everyone else
Improving Data Ingest for AI
Datamule Cloud - 7/12/25
Proposed System Architecture for Datamule
High Speed Algorithmic Document Parsing
Putting Institutional Holdings in a Data Warehouse
How to host the SEC Archive for $20/month
Creating Structured Datasets from SEC filings
Deploy a Financial Chatbot in 5 Minutes