June Update
Setting up cloud infrastructure.
Datamule Cloud
I'm now working on SEC data infrastructure in the cloud. See: Proposed System Architecture for Datamule.
Last Month
In May I released several high performance parsers:
- doc2dict: parse html and pdf files into dictionaries, preserving nesting.
- secsgml v0.2.2: parse SEC SGML.
They are fast enough to parse the entire SEC corpus even on small machines.