Datamule Cloud V3 SEC Filings Archive

The SEC Filings Archive has been reprocessed. Before, SGML objects under 500 KB were not compressed. Now all objects are compressed. Standard compression level is now zstd level 9, instead of level 3.

Stats (5/28/26):

Download speed for larger filings, such as 10-Ks, should be about 15% faster on average. Requests for parts of filings, such as the 10-K file within a 10-K should also be faster, as information needed for HTTP range requests are now stored in a database, rather than requiring a metadata call to the filings metadata.json at the head of an object.

Technical Setup

Monitor detects new filings and submits them to SQS. Archive Updater processes new filings, uploading to R2, then updating the lookup database that allows users to access the filings.

Archive Updater

Our bottleneck is the SEC and Cloudflare R2's PUT latency. Sometimes the SEC is very slow, this is unavoidable. R2 can also be slow. Sometimes reaching 5s for a 20 MB PUT. This is avoidable. The CEO of Tigris Data reached out. They have free egress as well. I think Datamule may move there.

SEC Filings Lookup

We currently use a t4g.medium. It's about $45/month. I think we may change. This has not been optimized.

sec_submission_details_table

Indexes:

sec_accession_cik_table

Indexes:

sec_documents_table

Indexes:

Datamule's Cost

Archive Updater

Extrapolating from year 2025, for SGML.

Size bucketnConfigLambda memoryBenchmark usedAvg total (warm)Total secondsCost
< 0.1 MB730,6701 vCPU1769 MB11KB warm (lvl9/1t)150ms109,600s$2.67
0.1–1 MB312,4771 vCPU1769 MBinterpolated215ms67,183s$1.61
1–10 MB149,1411 vCPU1769 MB11MB warm (lvl9/1t)370ms55,182s$1.30
10–50 MB31,1004 vCPU7076 MB107MB warm (lvl9/4t)850ms26,435s$2.44
50–100 MB4,6274 vCPU7076 MB107MB warm (lvl9/4t)850ms3,933s$0.36
100–250 MB5,4354 vCPU7076 MB107MB warm (lvl9/4t)850ms4,620s$0.43
250–500 MB8244 vCPU7076 MB528MB warm (lvl9/4t)2,280ms1,879s$0.17
500+ MB04 vCPU7076 MB528MB warm (lvl9/4t)2,280ms0s$0.00
Total1,234,274268,832s$8.99

TAR adds a little overhead. Expect ~$12/year.

Usage

To reduce io cost for many small files (e.g. form 3,4,5), Datamule writes files into a batch .tar format. This is easy to work with programmatically or manually. For example on windows you can:

alt text

TAR Archive (default)

Downloads filings. If specific files within a filing are specified, only downloads those.

from datamulehub import sec_filings_archive

# download tar document ranges to outfolder/filingDate/accession/filename
sec_filings_archive.download_tar(cik=320193, document_type=["10-K", "10-Q"], output_dir="outfolder")

SGML Archive

Downloads filings as raw sgml.

from datamulehub import sec_filings_archive

# download decompressed sgml filings to outfolder/filingDate/accession.sgml
sec_filings_archive.download_sgml(cik=320193, submission_type="10-K", output_dir="outfolder")

Quirks

Archive includes filings that are no longer present in the SEC. See The SEC publishes a lot of filings then deletes them. Here they are.