SEC Filings Archive V3 Benchmark: faster, much less bandwidth
Archive V3 is out. I've integrated it into my new package datamule-hub before integrating it into datamule-python.

Here are the old benchmarks.
Benchmarks
Two sources of slowness:
- getting data to download from database
- downloading
> Note: I ran archives multiple times to ensure they were warm before compiling final benchmark. I only ran them for the tar archive, as that is the one users will be interacting with the most.
Download all form 4s in January of 2020.
About 1/3rd faster, with half the bandwidth.
- Old: 13.42s | 202.5 MB downloaded
- New: 10.92s | 113.1 MB downloaded
> Note: Will be optimized further.
from datamule import Portfolio
portfolio = Portfolio('tar')
portfolio.download_submissions(filing_date=('2020-01-01','2020-01-31'),submission_type='4', provider='datamule-tar')
from datamulehub import sec_filings_archive
sec_filings_archive.download_tar(filing_date=('2020-01-01','2020-01-31'), submission_type="4", output_dir="tarnew", overwrite=True)
Download all 8-Ks in January of 2020.
About 6x faster, with 1/10th the bandwidth.
- Old: 24.48s | 337.1 MB downloaded
- New: 4.58s | 27.7 MB downloaded
from datamule import Portfolio
portfolio = Portfolio('tar')
portfolio.download_submissions(filing_date=('2020-01-01','2020-01-31'),document_type='8-K', provider='datamule-tar')
from datamulehub import sec_filings_archive
sec_filings_archive.download_tar(filing_date=('2020-01-01','2020-01-31'), document_type='8-K', output_dir="tarnew", overwrite=True)
Download all filings in the first three days of January 2020
Almost twice as fast, with 10% less bandwidth.
- Old: 63.31s | 910.9 MB downloaded
- New: 37.69s | 801.2 MB downloaded
from datamule import Portfolio
portfolio = Portfolio('tar')
portfolio.download_submissions(filing_date=('2020-01-01','2020-01-03'), provider='datamule-tar')
from datamulehub import sec_filings_archive
sec_filings_archive.download_tar(filing_date=('2020-01-01','2020-01-03'), output_dir="tarnew", overwrite=True)
Download every 10-K in 2025, just the root form.
Almost twice as fast, with half the bandwidth.
- Old: 82.74s | 1900.6 MB downloaded
- New: 45.20s | 1082.3 MB downloaded
from datamule import Portfolio
portfolio = Portfolio('tar')
portfolio.download_submissions(filing_date=('2025-01-01','2025-12-31'),submission_type='10-K',document_type='10-K', provider='datamule-tar')
from datamulehub import sec_filings_archive
sec_filings_archive.download_tar(filing_date=('2025-01-01','2025-12-31'), submission_type='10-K',document_type='10-K', output_dir="tarnew", overwrite=True)