May Update
Trying a bunch of new stuff.
Current
- Generalized HTML to dictionary converter, capable of handling tables. Currently benchmarked at 200 pages / second single threaded.
- Automatically updating database for every SEC XML flattened into tables for fast retrieval.
- Getting SEC data a few seconds faster than current free or commercial methods, as well as getting metadata 30-45 seconds faster. Not sure if this is interesting or not - please let me know if it is.
On the backburner
- Extracting insights about the economy from SEC data.
- Building datasets using LLM structured output - e.g. board of directors data.
- noSQL databases for SEC data in html form converted to json. For example: 10-Ks, Exhibits, and more!
- Something cool involving every image file submitted to the SEC. Not sure what - please message me if you have an idea!
- Updating my SEC archive with pre-parsed SGML files and http ranges for 10-100x performance boost.
For companies
- Setup my cloud resources for easy ingest into your data warehouse.
Not a priority. Contact me if you need this bumped up the list.
Resources
- $100k AWS
- $2k GCP
- $5k MongoDB
Thank you to everyone who helped me get these credits. I'm extremely grateful, and will spend it on cool projects.