The output file is in plain text format; documents are separated (by default) by <doc id=”doc1000001”> … </doc> tags. The purpose of the plain text file is for further processing, e.g., generating linguistic annotation using the TreeTagger or the Stanford parser for part-of-speech annotation or dependency / constituency parsing.