textxtraction.properties:
# Specify the path to Tika configuration file
com.openexchange.textxtraction.tikaConfig=/opt/open-xchange/etc/tika-config.xml
A Tika parser must implement org.apache.tika.parser.Parser. A parser can be registered within tika-config.xml with its full qualified name. A set of parsers that are included in Tika are already registered to specify the document types that can be parsed by default.
tika-config.xml:
<config>
<parser class=org.apache.tika.parser.html.HtmlParser />
<parser class=org.apache.tika.parser.microsoft.OfficeParse r />
<parser class=org.apache.tika.parser.microsoft.ooxml.OOXML Parser />
<parser class=org.apache.tika.parser.odf.OpenDocumentParse r />
<parser class=org.apache.tika.parser.pdf.PDFParser />
<parser class=org.apache.tika.parser.rtf.RTFParser />
<parser class=org.apache.tika.parser.txt.TXTParser />
<parser class=org.apache.tika.parser.xml.DcXMLParser />
</config>
# Specify the path to Tika configuration file
com.openexchange.textxtraction.tikaConfig=/opt/open-xchange/etc/tika-config.xml
A Tika parser must implement org.apache.tika.parser.Parser. A parser can be registered within tika-config.xml with its full qualified name. A set of parsers that are included in Tika are already registered to specify the document types that can be parsed by default.
tika-config.xml:
<config>
<parser class=org.apache.tika.parser.html.HtmlParser />
<parser class=org.apache.tika.parser.microsoft.OfficeParse r />
<parser class=org.apache.tika.parser.microsoft.ooxml.OOXML Parser />
<parser class=org.apache.tika.parser.odf.OpenDocumentParse r />
<parser class=org.apache.tika.parser.pdf.PDFParser />
<parser class=org.apache.tika.parser.rtf.RTFParser />
<parser class=org.apache.tika.parser.txt.TXTParser />
<parser class=org.apache.tika.parser.xml.DcXMLParser />
</config>