Datamule SEC Filings Metadata
- Requires a Metadata subscription.
- Filing metadata, document metadata, accession-to-CIK mappings, and monitor dumps.
- Datasets updated daily.
Usage
The Metadata subscription includes five queryable tables:
submissions_metadatasec_submission_details_tablesec_accession_cik_tablesec_documents_tablemonitor_dumps
Query Database
Use datamulehub.databases.query when you want Athena to run SQL and write the result as Parquet files.
from datamulehub import databases
databases.query(
"""
SELECT
accessionnumber,
cik,
"form",
filingdate,
reportdate,
primarydocument
FROM submissions_metadata
WHERE filingdate >= '2024-01-01'
AND "form" IN ('10-K', '10-Q')
LIMIT 1000
""",
output_dir="metadata_sample",
)
This writes one or more Parquet files to metadata_sample.
You can also join metadata tables. For example, this joins submission details, accession-to-CIK mappings, and document rows:
from datamulehub import databases
databases.query(
"""
SELECT
d.accessionnumber,
c.cik,
d.submissiontype,
d.filingdate,
doc.sequence,
doc.documenttype,
doc.filename,
doc.description
FROM sec_submission_details_table d
JOIN sec_accession_cik_table c
ON d.accessionnumber = c.accessionnumber
JOIN sec_documents_table doc
ON d.accessionnumber = doc.accessionnumber
WHERE d.submissiontype = '10-K'
AND d.filingdate >= '2024-01-01'
AND doc.documenttype = '10-K'
LIMIT 1000
""",
output_dir="ten_k_documents",
)
Read Query
Use datamulehub.databases.read_query when you want a small query result back in Python instead of keeping the Parquet files.
from datamulehub import databases
rows = databases.read_query(
"""
SELECT
accession,
source,
detected_time
FROM monitor_dumps
WHERE source = 'anticipate'
LIMIT 10
"""
)
print(rows)
read_query downloads the Parquet result files to a temporary directory, reads them, and returns the rows.
Download Dataset
Use datamulehub.datasets.download when you want a full metadata dataset.
from datamulehub import datasets
datasets.download(
"submissions_metadata",
filename="submissions_metadata.parquet",
)
datasets.download(
"sec_submission_details_table",
filename="sec_submission_details_table.parquet",
)
datasets.download(
"sec_accession_cik_table",
filename="sec_accession_cik_table.parquet",
)
datasets.download(
"sec_documents_table",
filename="sec_documents_table.parquet",
)
monitor_dumps is organized by date in S3, so the easiest way to work with it is usually through databases.query or databases.read_query.
Tables
Schemas below use the Athena table and column names from athena_tables.json.
submissions_metadata
Comprehensive filing metadata extracted from the SEC bulk submissions data.
Athena table and dataset name: submissions_metadata
| Column | |---| | cik | | accessionnumber | | filingdate | | reportdate | | acceptancedatetime | | act | | form | | filenumber | | filmnumber | | items | | core_type | | size | | isxbrl | | isinlinexbrl | | isxbrlnumeric | | primarydocument | | primarydocdescription |
sec_submission_details_table
Small filing-level lookup table used by the SEC filings archive.
Athena table and dataset name: sec_submission_details_table
| Column | |---| | accessionnumber | | submissiontype | | filingdate | | reportdate | | detectedtime | | containsxbrl |
sec_accession_cik_table
Maps each accession number to one or more CIKs.
Athena table and dataset name: sec_accession_cik_table
| Column | |---| | filingdate | | accessionnumber | | cik |
sec_documents_table
Document-level metadata for files inside SEC submissions.
Athena table and dataset name: sec_documents_table
| Column | |---| | filingdate | | accessionnumber | | sequence | | documenttype | | filename | | description | | tarstartbyte | | tarendbyte |
monitor_dumps
Detection-time records for Datamule filing monitors. The source field includes sources such as Efts, Rss, and anticipate.
Athena table name: monitor_dumps
| Column | |---| | accession | | source | | detected_time |