Anticipating SEC Filings

There are faster ways to get SEC filings. I've previously open sourced code showing how: the fastest way to get SEC Filings. Now, I'm including a basic implementation on Datamule.

alt text

This is accessible via SEC Filing Emails, Webhooks, and will soon be made available on Websocket V3. See products. You can also use Datamule's complimentary SEC filings feed.

Here's how it works:

Most companies and investors get notified of a new filing after it is released on the RSS feed.
This is typically 30 seconds after the filing's metadata page is generated.
This page can be anticipated, by predicting the url, which requires knowing its accession.
accessions mostly follow the format {cik of company filing on behalf}- {year}-{sequential numbering of filings submitted so far this year}.
So, I predict it.

I'm currently using a naive algorithm that should anticipate ~10% of filings. It costs me $20/year to run. For ~$80/year I could get this to ~40%.

> Note: after I published the fastest way to get SEC Filings, a few trading firms reached out. They had been working on the same problem; but had not been able to make it work. Their issue was that they had not realized that the metadata page url could be constructed without knowing the cik of the company being filed for. This made the search space large, and therefore expensive.

> Note 2: metadata is published before filing content. You can typically get filing content a bit faster, especially for larger file like 10-Ks. Generally e.g. 30s faster vs 2s faster, metadata to data.

Preliminary Results

After collecting data for about a month. We get:

period_start_date: 2026/05/27
total_filings_in_period: 71289
filings_with_anticipate: 7440
p25_seconds_faster: 1
median_seconds_faster: 2
p75_seconds_faster: 8
p90_seconds_faster: 20
p99_seconds_faster: 53
max_seconds_faster: 199
avg_seconds_faster: 6.93984962406015

This cost $0.002 to run.

Code

from datamulehub import databases

rows = databases.read_query(
    """
    SELECT
        MIN(start_date) AS period_start_date,
        COUNT(*) AS total_filings_in_period,
        SUM(CASE WHEN anticipate_time IS NOT NULL THEN 1 ELSE 0 END) AS filings_with_anticipate,
        SUM(CASE WHEN first_sec_time IS NOT NULL THEN 1 ELSE 0 END) AS filings_with_rss_or_efts,
        SUM(CASE WHEN anticipate_time IS NOT NULL AND first_sec_time IS NOT NULL THEN 1 ELSE 0 END) AS filings_with_both_sources,
        SUM(CASE WHEN first_sec_time > anticipate_time THEN 1 ELSE 0 END) AS anticipate_faster_filings,
        CAST(SUM(CASE WHEN anticipate_time IS NOT NULL THEN 1 ELSE 0 END) AS double) / COUNT(*) AS anticipate_coverage_rate,
        CAST(SUM(CASE WHEN first_sec_time > anticipate_time THEN 1 ELSE 0 END) AS double)
            / NULLIF(SUM(CASE WHEN anticipate_time IS NOT NULL AND first_sec_time IS NOT NULL THEN 1 ELSE 0 END), 0)
            AS anticipate_win_rate_when_comparable,
        MIN(CASE WHEN first_sec_time > anticipate_time THEN first_sec_time - anticipate_time END) AS min_seconds_faster,
        approx_percentile(CASE WHEN first_sec_time > anticipate_time THEN first_sec_time - anticipate_time END, 0.25) AS p25_seconds_faster,
        approx_percentile(CASE WHEN first_sec_time > anticipate_time THEN first_sec_time - anticipate_time END, 0.50) AS median_seconds_faster,
        approx_percentile(CASE WHEN first_sec_time > anticipate_time THEN first_sec_time - anticipate_time END, 0.75) AS p75_seconds_faster,
        approx_percentile(CASE WHEN first_sec_time > anticipate_time THEN first_sec_time - anticipate_time END, 0.90) AS p90_seconds_faster,
        approx_percentile(CASE WHEN first_sec_time > anticipate_time THEN first_sec_time - anticipate_time END, 0.99) AS p99_seconds_faster,
        MAX(CASE WHEN first_sec_time > anticipate_time THEN first_sec_time - anticipate_time END) AS max_seconds_faster,
        AVG(CASE WHEN first_sec_time > anticipate_time THEN first_sec_time - anticipate_time END) AS avg_seconds_faster
    FROM (
        SELECT
            sd.accessionnumber,
            sd.filingdate,
            start_boundary.start_date,
            m.anticipate_time,
            m.first_sec_time
        FROM sec_submission_details_table sd
        CROSS JOIN (
            SELECT date_add('day', 1, date(from_unixtime(MIN(detected_time)))) AS start_date
            FROM monitor_dumps
            WHERE lower(source) = 'anticipate'
        ) start_boundary
        LEFT JOIN (
            SELECT
                accession,
                MIN(CASE WHEN lower(source) = 'anticipate' THEN detected_time END) AS anticipate_time,
                MIN(CASE WHEN lower(source) IN ('rss', 'efts') THEN detected_time END) AS first_sec_time
            FROM monitor_dumps
            GROUP BY accession
        ) m
          ON CAST(sd.accessionnumber AS varchar) = CAST(m.accession AS varchar)
        WHERE sd.filingdate >= CAST(start_boundary.start_date AS varchar)
    ) joined
    """
)

print(rows)