Question

我正在尝试使用java方法从db获取以下xml但我收到错误

用于解析xml的代码

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();

InputSource is = new InputSource(new ByteArrayInputStream(cond.getBytes()));

Document doc = db.parse(is);

Element elem = doc.getDocumentElement();

// here we expect a series of <data><name>N</name><value>V</value></data>
NodeList nodes = elem.getElementsByTagName("data");

TableID jobId = new TableID(_processInstanceId);
Job myJob = Job.queryByID(_clientContext, jobId, true);

if (nodes.getLength() == 0) {
    log(Level.DEBUG, "No data found on condition XML");

}

for (int i = 0; i < nodes.getLength(); i++) {
    // loop through the <data> in the XML

    Element dataTags = (Element) nodes.item(i);
    String name = getChildTagValue(dataTags, "name");
    String value = getChildTagValue(dataTags, "value");

    log(Level.INFO, "UserData/Value=" + name + "/" + value);

    myJob.setBulkUserData(name, value);
}

myJob.save();

数据

<ContactDetails>307896043</ContactDetails>
<ContactName>307896043</ContactName>
<Preferred_Completion_Date>
</Preferred_Completion_Date>
<service_address>A-End Address: 1ST HELIERST HELIERJT2 3XP832THE CABLES 1 POONHA LANEST HELIER JE JT2 3XP</service_address>
<ServiceOrderId>315473043</ServiceOrderId>
<ServiceOrderTypeId>50</ServiceOrderTypeId>
<CustDesiredDate>2013-03-20T18:12:04</CustDesiredDate>
<OrderId>307896043</OrderId>
<CreateWho>csmuser</CreateWho>
<AccountInternalId>20100333</AccountInternalId>
<ServiceInternalId>20766093</ServiceInternalId>
<ServiceInternalIdResets>0</ServiceInternalIdResets>
<Primary_Offer_Name  action='del'>MyMobile Blue &#163;44.99 [12 month term]</Primary_Offer_Name>
<Disc_Reason  action='del'>8</Disc_Reason>
<Sup_Offer  action='del'>80000257</Sup_Offer>
<Service_Type  action='del'>A-01-00</Service_Type>
<Priority  action='del'>4</Priority>
<Account_Number  action='del'>0</Account_Number>
<Offer  action='del'>80000257</Offer>
<msisdn  action='del'>447797142520</msisdn>
<imsi  action='del'>234503184</imsi>
<sim  action='del'>5535</sim>
<ocb9_ARM  action='del'>false</ocb9_ARM>
<port_in_required  action='del'>
</port_in_required>
<ocb9_mob  action='del'>none</ocb9_mob>
<ocb9_mob_BB  action='del'>
</ocb9_mob_BB>
<ocb9_LandLine  action='del'>
</ocb9_LandLine>
<ocb9_LandLine_BB  action='del'>
</ocb9_LandLine_BB>
<Contact_2>
</Contact_2>
<Acc_middle_name>
</Acc_middle_name>
<MarketCode>7</MarketCode>
<Acc_last_name>Port_OUT</Acc_last_name>
<Contact_1>
</Contact_1>
<Acc_first_name>.</Acc_first_name>
<EmaiId>
</EmaiId>

错误

 org.apache.xerces.impl.io.MalformedByteSequenceException: Invalid byte 1 of 1-byte UTF-8 sequence.

我读了一些线程，因为xml中有一些特殊字符。如何解决这个问题？

Answer 1

如何解决此问题？

使用正确的字符编码读取数据。错误消息表示您尝试将数据读取为UTF-8（故意或因为这是未指定<?xml version="1.0" encoding="somethingelse"?>的XML文件的默认编码），但它实际上使用不同的编码，例如ISO-8859-1或Windows-1252。

为了能够就如何执行此操作提供建议，我必须查看您当前用于阅读XML的代码。

Answer 2

在记事本中打开xml
确保文档的开头和结尾没有多余的空间。
选择文件 - ＆gt;另存为
选择另存为类型 - ＆gt;所有文件
输入文件名为abcd.xml
选择编码 - UTF-8 - ＆gt;单击“保存”

Answer 3

尝试：

InputStream inputStream= // Your InputStream from your database.
Reader reader = new InputStreamReader(inputStream,"UTF-8");

InputSource is = new InputSource(reader);
is.setEncoding("UTF-8");

saxParser.parse(is, handler);

如果它不是UTF-8，只需更改好的编码部分。

Answer 4

我将xml作为String并使用xml.getBytes（）并获取此错误。更改为xml.getBytes（Charset.forName（＆＃34; UTF-8＆＃34;））为我工作。

Answer 5

我遇到了这个问题，但文件是UTF-8，只是因为某些字符已经进入而没有以UTF-8编码。为了解决这个问题，我做了这个帖子中陈述的内容，即我验证了文件： How to check whether a file is valid UTF-8?

基本上你运行命令：

$ iconv -f UTF-8 your_file -o / dev / null

如果有些东西没有用UTF-8编码，它会给你行号和行号，以便你可以找到它。

Answer 6

This error comes when you are trying to load jasper report file with the extension .jasper
For Example 
c://reports//EmployeeReport.jasper"

While you should load jasper report file with the extension .jrxml
For Example 
c://reports//EmployeeReport.jrxml"
[See Problem Screenshot ][1] [1]: https://i.stack.imgur.com/D5SzR.png
[See Solution Screenshot][2] [2]: https://i.stack.imgur.com/VeQb9.png

Answer 7

由于Ant构建，我碰巧遇到了这个问题。

Ant构建文件并将filterchain expandproperties应用于它。在此文件过滤期间，我的Windows机器的隐式默认非UTF-8字符编码用于生成过滤文件 - 因此无法正确映射其字符集之外的字符。

一种解决方案是为Ant提供UTF-8的显式环境变量。在Cygwin中，在启动Ant之前：export ANT_OPTS="-Dfile.encoding=UTF-8"。

Answer 8

我遇到了同样的问题，经过长时间调查我的XML文件，我发现了问题：很少有非转义字符，例如« »。

Answer 9

像我这样理解字符编码原则的人，also read Joel's article这很有趣，因为它contains wrong characters anyway和仍然无法弄清楚是什么（剧透警告，我是Mac用户）那么您的解决方案就像删除本地仓库并再次克隆一样简单。

我的代码库自上次运行以来没有改变，所以因为我们的构建系统从不抱怨它而导致UTF错误没有任何意义....直到我记得我不小心拔掉了我的电脑几天前用IntelliJ Idea和整个事情运行（Java / Tomcat / Hibernate）

我的Mac做了很棒的工作，假装没有发生任何事情，我像往常一样继续经营，但基础文件系统在某种程度上被破坏了。浪费了一整天试图弄清楚这个。我希望它有所帮助。

Answer 10

我在JSF应用程序中遇到了同样的问题，该问题的注释行在XMHTL页面中包含一些特殊字符。当我在日食中比较以前的版本时，它有一个注释，

//Some �  special characters found

删除了这些字符，页面加载正常。大多数情况下，它与XML文件有关，因此请与工作版本进行比较。

Answer 11

我有同样的问题。我的问题是WebLogic服务器中statWeblogic.cmd文件中JAVA_OPTION下缺少“ -Dfile.encoding = UTF8”自变量。

Answer 12

您有一个需要删除的资料库就像下面的库

   implementation 'org.apache.maven.plugins:maven-surefire-plugin:2.4.3'

Answer 13

我有类似的问题。我已经将一些xml保存在文件中，当将其读入DOM文档时，由于特殊字符而导致失败。然后，我使用以下代码对其进行了修复：

String enco = new String(Files.readAllBytes(Paths.get(listPayloadPath+"/Payload.xml")), StandardCharsets.UTF_8);

Document doc = builder.parse(new ByteArrayInputStream(enco.getBytes(StandardCharsets.UTF_8)));

让我知道它是否对您有用。

Answer 14

这个错误让我在生产中感到惊讶...

错误是因为字符编码错误，所以最好的解决方案是实现一种自动检测输入字符集的方法。

这是一种方法：

...    
import org.xml.sax.InputSource;
...

InputSource inputSource = new InputSource(inputStream);
someReader(
    inputSource.getByteStream(), inputSource.getEncoding()
  );

输入样本：

<?xml version="1.0" encoding="utf-16"?>
<rss xmlns:dc="https://purl.org/dc/elements/1.1/" version="2.0">
<channel>
...

如何修复1字节UTF-8序列的无效字节1

14 个答案: