SEC Filings Archive V3 Benchmark: faster, much less bandwidth

Archive V3 is out. I've integrated it into my new package datamule-hub before integrating it into datamule-python.

alt text

Here are the old benchmarks.

Benchmarks

Two sources of slowness:

> Note: I ran archives multiple times to ensure they were warm before compiling final benchmark. I only ran them for the tar archive, as that is the one users will be interacting with the most.

Download all form 4s in January of 2020.

About 1/3rd faster, with half the bandwidth.

> Note: Will be optimized further.

from datamule import Portfolio

portfolio = Portfolio('tar')
portfolio.download_submissions(filing_date=('2020-01-01','2020-01-31'),submission_type='4', provider='datamule-tar')
from datamulehub import sec_filings_archive

sec_filings_archive.download_tar(filing_date=('2020-01-01','2020-01-31'), submission_type="4", output_dir="tarnew", overwrite=True)

Download all 8-Ks in January of 2020.

About 6x faster, with 1/10th the bandwidth.

from datamule import Portfolio

portfolio = Portfolio('tar')
portfolio.download_submissions(filing_date=('2020-01-01','2020-01-31'),document_type='8-K', provider='datamule-tar')
from datamulehub import sec_filings_archive

sec_filings_archive.download_tar(filing_date=('2020-01-01','2020-01-31'), document_type='8-K', output_dir="tarnew", overwrite=True)

Download all filings in the first three days of January 2020

Almost twice as fast, with 10% less bandwidth.

from datamule import Portfolio

portfolio = Portfolio('tar')
portfolio.download_submissions(filing_date=('2020-01-01','2020-01-03'), provider='datamule-tar')
from datamulehub import sec_filings_archive

sec_filings_archive.download_tar(filing_date=('2020-01-01','2020-01-03'), output_dir="tarnew", overwrite=True)

Download every 10-K in 2025, just the root form.

Almost twice as fast, with half the bandwidth.

from datamule import Portfolio

portfolio = Portfolio('tar')
portfolio.download_submissions(filing_date=('2025-01-01','2025-12-31'),submission_type='10-K',document_type='10-K', provider='datamule-tar')
from datamulehub import sec_filings_archive

sec_filings_archive.download_tar(filing_date=('2025-01-01','2025-12-31'), submission_type='10-K',document_type='10-K', output_dir="tarnew", overwrite=True)