TS 16.4.1: Can I turn off Tika XML Parsing in SOLR?
I just got TS 16.4.1 up and running, and also the Indexer working with SOLR and Zookeeper. While indexing our content, I noticed some errors scroll by as I tailed the index logs. There were a lot of XML Parsing errors reported. A closer look revealed that it was parsing our HTML fragments and declaring the "XML" invalid when we are producing the exact HTML that we want to produce for our website fragments, according to the business. I don't want or need Tika.SOLR telling me our HTML is malformed by its XML Parser. I want to turn this off as I believe when it errors out on a file due to XML Parsing issues, the content of the file does not get indexed. I'm not entirely sure about that, but I tested it with one file that failed and I could not get Search to return that file when searching for keywords that appear in the file. The filename comes back in a filename search, but not any contents.
Anyway, does anyone know where this is configured and how I can turn that feature off?
Categories
- All Categories
- Cloud Editions
- 1 Thrust Services
- cat as link
- 1 Core SaaS Application development
- 10 Developer Announcements
- 15 General Questions
- 33.4K TeamSite
- 141 Application Governance & Archiving
- 15.2K Designing Analytics Reports
- 1K DevShare Downloads
- Core SaaS Applications
- nested parent
- 4.3K Developing Analytics Applications
- 8.8K Documentum Developer Forum
- Media Management developer
- 159 Transactional Content Processing (TCP)
- 1.7K Web Experience Management
- 55 Tempo Social
- 1 XM Fax