Roadmap

- SEC Filings Instant Analysis
  LLM summaries of SEC filings within ~1s of a filing being published. Supports bring your own prompt and model.

- Datamule LLM
  Going to train a LLM.

- SEC Vector Embeddings
  Vector embedding of every file within the SEC Corpus, stored in S3.

- SEC Filings Text Search Database
  Search text within the entire 16tb SEC Corpus.

- SEC Filings Inverted Index
  Inverted Index of all text within the SEC Corpus stored in S3. Includes submission, document, and position within document.

- SEC Full Names
  Full person names, like Peter Jackson, extracted from every SEC filing.

- SEC Graphics Annotations
  Adding tags, descriptions, and classifications to all SEC GRAPHIC documents.

- Filing Items
  Items extracted from SEC Filings, stored in columnar format. For example, 10-K Item1A,1B,...

- SEC Business Development Company Investments S3
  Business Development Company Investments tables extracted from 10-K and 10-Q filings.

- SEC Business Classifications
  Open ended. Considering a vectorized system or tagging system.

- SEC Filings Text
  SEC .html, .txt, and most .pdf filings converted to data tuples format. Data tuples is the native format of doc2dict, an extremely fast document parser. Data tuples can quickly be cast into plain text, markdown, and nested dictionaries.

- SEC Filings Classified Text
  Open Ended. SEC .html, .txt, and most .pdf filings converted to data tuples format, then classified into relevant categories useful for filtering context into LLMs.

- SEC Company Metadata
  Company websites, address, telephone number extracted from filings.

- SEC Filings Sentiment
  Open ended. Loughran McDonald is one approach.

- SEC Filings Complexity
  Open ended.

- SEC Filings Stock Splits
  Stock split details, including reason for split, what happens to existing shares, total shares outstanding.

- Audit Fees
  Audit Fees extracted from DEF 14A Filings

- Subsidiaries
  Subsidiaries extracted from EX-21 documents

- Untagged XBRL
  Untagged XBRL