任何人都可以告诉我可能导致此问题的原因是什么? 我试着用post.jar发布一个文件xml;我在服务器日志下面
118208 [qtp760665089-18] ERROR org.apache.solr.servlet.SolrDispatchFilter û nul
l:java.lang.RuntimeException: [was class java.io.CharConversionException] Invali
d UTF-8 middle byte 0x6c (at char #139212, byte #136949)
at com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.j
ava:18)at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731)
at com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.j
ava:3657)at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
at org.apache.solr.handler.loader.XMLLoader.readDoc(XMLLoader.java:397)
at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java
:246)
[...]
Caused by: java.io.CharConversionException: Invalid UTF-8 middle byte 0x6c (at c
har #139212, byte #136949)
at com.ctc.wstx.io.UTF8Reader.reportInvalidOther(UTF8Reader.java:313)
at com.ctc.wstx.io.UTF8Reader.read(UTF8Reader.java:204)
at com.ctc.wstx.io.ReaderSource.readInto(ReaderSource.java:84)
at com.ctc.wstx.io.BranchingReaderSource.readInto(BranchingReaderSource.
java:57)...
答案 0 :(得分:1)
您的文档中有1个或多个非法(例如非UTF-8)字符:
http://www.coderanch.com/t/433718/XML/Invalid-UTF-middle-byte-error
我会仔细查看该文档,并考虑仅对UTF-8进行剥离/过滤
以前的stackoverflow答案在Perl和Java中有几个代码片段用于过滤掉非UTF-8字符:
How to remove bad characters that are not suitable for utf8 encoding in MySQL?