Roadmap
- SEC Filings Instant Analysis LLM summaries of SEC filings within ~1s of a filing being published. Supports bring your own prompt and model. - Datamule LLM Going to train a LLM. - SEC Vector Embeddings Vector embedding of every file within the SEC Corpus, stored in S3. - SEC Filings Text Search Database Search text within the entire 16tb SEC Corpus. - SEC Filings Inverted Index Inverted Index of all text within the SEC Corpus stored in S3. Includes submission, document, and position within document. - SEC Full Names Full person names, like Peter Jackson, extracted from every SEC filing. - SEC Graphics Annotations Adding tags, descriptions, and classifications to all SEC GRAPHIC documents. - Filing Items Items extracted from SEC Filings, stored in columnar format. For example, 10-K Item1A,1B,... - SEC Business Development Company Investments S3 Business Development Company Investments tables extracted from 10-K and 10-Q filings. - SEC Business Classifications Open ended. Considering a vectorized system or tagging system. - SEC Filings Text SEC .html, .txt, and most .pdf filings converted to data tuples format. Data tuples is the native format of doc2dict, an extremely fast document parser. Data tuples can quickly be cast into plain text, markdown, and nested dictionaries. - SEC Filings Classified Text Open Ended. SEC .html, .txt, and most .pdf filings converted to data tuples format, then classified into relevant categories useful for filtering context into LLMs. - SEC Company Metadata Company websites, address, telephone number extracted from filings. - SEC Filings Sentiment Open ended. Loughran McDonald is one approach. - SEC Filings Complexity Open ended. - SEC Filings Stock Splits Stock split details, including reason for split, what happens to existing shares, total shares outstanding. - Audit Fees Audit Fees extracted from DEF 14A Filings - Subsidiaries Subsidiaries extracted from EX-21 documents - Untagged XBRL Untagged XBRL