June Update

Setting up cloud infrastructure.

Datamule Cloud

I'm now working on SEC data infrastructure in the cloud. See: Proposed System Architecture for Datamule.

Last Month

In May I released several high performance parsers:

  1. doc2dict: parse html and pdf files into dictionaries, preserving nesting.
  2. secsgml v0.2.2: parse SEC SGML.

They are fast enough to parse the entire SEC corpus even on small machines.