May Update

Trying a bunch of new stuff.

Current

  1. Generalized HTML to dictionary converter, capable of handling tables. Currently benchmarked at 200 pages / second single threaded.
  2. Automatically updating database for every SEC XML flattened into tables for fast retrieval.
  3. Getting SEC data a few seconds faster than current free or commercial methods, as well as getting metadata 30-45 seconds faster. Not sure if this is interesting or not - please let me know if it is.

On the backburner

  1. Extracting insights about the economy from SEC data.
  2. Building datasets using LLM structured output - e.g. board of directors data.
  3. noSQL databases for SEC data in html form converted to json. For example: 10-Ks, Exhibits, and more!
  4. Something cool involving every image file submitted to the SEC. Not sure what - please message me if you have an idea!
  5. Updating my SEC archive with pre-parsed SGML files and http ranges for 10-100x performance boost.

For companies

  1. Setup my cloud resources for easy ingest into your data warehouse.

Not a priority. Contact me if you need this bumped up the list.

Resources

  1. $100k AWS
  2. $2k GCP
  3. $5k MongoDB

Thank you to everyone who helped me get these credits. I'm extremely grateful, and will spend it on cool projects.