Commission Products

We want to build a lot of products, but we don't know what's useful. If you know what you want and can commit to a year upfront, we'll add it to our catalog.

Commissions

SEC Vector Embeddings
Vector embedding of every file within the SEC Corpus, stored in S3.
SEC Filings Text Search Database
Search text within the entire 16tb SEC Corpus.
SEC Filings Inverted Index
Inverted Index of all text within the SEC Corpus stored in S3. Includes submission, document, and position within document. Ballpark: 2tb in parquet format.
SEC Full Names
Full person names, like Peter Jackson, extracted from every SEC filing.
SEC Graphics Annotations
Adding tags, descriptions, and classifications to all SEC GRAPHIC documents.
Filing Items
Items extracted from SEC Filings, stored in columnar format. For example, 10-K Item1A,1B,...
SEC Business Development Company Investments S3
Business Development Company Investments tables extracted from 10-K and 10-Q filings.
SEC Business Classifications
Open ended. Considering a vectorized system or tagging system.
SEC Filings Text
SEC .html, .txt, and most .pdf filings converted to data tuples format. Data tuples is the native format of doc2dict, an extremely fast document parser. Data tuples can quickly be cast into plain text, markdown, and nested dictionaries.
SEC Filings Classified Text
Open Ended. SEC .html, .txt, and most .pdf filings converted to data tuples format, then classified into relevant categories useful for filtering context into LLMs.
SEC Company Metadata
Company websites, address, telephone number extracted from filings.
SEC Filings Sentiment
Open ended. Loughran McDonald is one approach.
SEC Filings Complexity
Open ended.

Interested?