TS 16.4.1: Can I turn off Tika XML Parsing in SOLR?

I just got TS 16.4.1 up and running, and also the Indexer working with SOLR and Zookeeper. While indexing our content, I noticed some errors scroll by as I tailed the index logs. There were a lot of XML Parsing errors reported. A closer look revealed that it was parsing our HTML fragments and declaring the "XML" invalid when we are producing the exact HTML that we want to produce for our website fragments, according to the business. I don't want or need Tika.SOLR telling me our HTML is malformed by its XML Parser. I want to turn this off as I believe when it errors out on a file due to XML Parsing issues, the content of the file does not get indexed. I'm not entirely sure about that, but I tested it with one file that failed and I could not get Search to return that file when searching for keywords that appear in the file. The filename comes back in a filename search, but not any contents.

Anyway, does anyone know where this is configured and how I can turn that feature off?

Find more posts tagged with

Comments

There are no comments yet