Lander (https://github.com/lsst-sqre/lander) uses metasrc (https://github.com/lsst-sqre/metasrc) to discover document metadata from the document source itself. Normalizing LaTeX into plain unicode text is a non-trivial challenge.
An approach that Tim Jenness suggested is to use pandoc to convert the document into HTML first, and then extract metadata from the HTML. HTML is generally an easier format to exrtact information from since it's standards-based.
https://pypi.python.org/pypi/pypandoc might be useful since it's a package that includes pandoc.
We might need to install pandoc in the lsst-texmf docker container so that pandoc can run inside a real latex environment with the lsstdoc class pre-installed.