Projects
Open Source Organizations
Structured Output - Making shareable datasets, using LLM structured outputs. Basically a lot of financial datasets are:
- Expensive ($35k price tag common)
- Hard to share (Professor can only share with PhDs, not masters students)
- Not commercializable (License restricts resale of data, or derivatives)
Would be nice to change this.
Open Source Projects
- datamule - Work with SEC data at scale
- doc2dict - Convert documents (HTML, XML, PDF, etc) into dictionaries
- datamule-data - Up to date data files for datamule using GitHub actions
- datamule-indicators - Automatically updating indicators generated from SEC data
- txt2dataset - Convert text into datasets
- secsgml - Parse SEC SGML efficiently
- secxbrl - Fast, lightweight parser designed for SEC InLine XBRL.
- company-fundamentals - standardize SEC XBRL into fundamentals data, such as EBITDA.
Papers & Articles
- Managerial Differentiation - Forthcoming
- NLP on a Budget
- How to (probably) get SEC filings seconds to minutes before everyone else
- Improving Data Ingest for AI
- Datamule Cloud - 7/12/25
- Proposed System Architecture for Datamule
- High Speed Algorithmic Document Parsing
- Putting Institutional Holdings in a Data Warehouse
- How to host the SEC Archive for $20/month
- Creating Structured Datasets from SEC filings
- Deploy a Financial Chatbot in 5 Minutes