Projects

Open Source Organizations

Structured Output - Making shareable datasets, using LLM structured outputs. Basically a lot of financial datasets are:

  1. Expensive ($35k price tag common)
  2. Hard to share (Professor can only share with PhDs, not masters students)
  3. Not commercializable (License restricts resale of data, or derivatives)

Would be nice to change this.

Open Source Projects

  • datamule - Work with SEC data at scale
  • doc2dict - Convert documents (HTML, XML, PDF, etc) into dictionaries
  • datamule-data - Up to date data files for datamule using GitHub actions
  • datamule-indicators - Automatically updating indicators generated from SEC data
  • txt2dataset - Convert text into datasets
  • secsgml - Parse SEC SGML efficiently
  • secxbrl - Fast, lightweight parser designed for SEC InLine XBRL.
  • company-fundamentals - standardize SEC XBRL into fundamentals data, such as EBITDA.