我使用open()来读取日志文件,但我的内容很奇怪。如果我通过Notepad ++打开日志文件,复制内容并将其粘贴到新文件中,将其保存为.txt文件,open()可以读取正确的内容。 代码是:
public void processRawNews(String rootPath,String filename) throws Exception {
try {
//@debug
out.println("debug 1= ");
JAXBContext jaxbContext = JAXBContext.newInstance(newsMLObj.class);
out.println("debug 2= " + jaxbContext);
SAXParserFactory spf = SAXParserFactory.newInstance();
out.println("debug 3= " + spf);
XMLReader xr = spf.newSAXParser().getXMLReader();
// to bypass XML DocType and Entity as Jap did not provide proper XML
xr.setFeature("http://xml.org/sax/features/validation", false);
xr.setFeature("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false);
xr.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
xr.setFeature("http://xml.org/sax/features/external-general-entities", false);
xr.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
xr.setFeature("http://xml.org/sax/features/use-entity-resolver2", false);
out.println("debug 4= " + xr);
// remove UTF-8-Bom
Path path = Paths.get(rootPath + filename);
out.println("debug 5= " + path);
byte[] xmlcontent = Files.readAllBytes(path);
String s = new String(xmlcontent, StandardCharsets.UTF_8);
s = s.replaceFirst("^\uFEFF", "");
byte[] content2 = s.getBytes(StandardCharsets.UTF_8);
if (content2.length != xmlcontent.length) {
Files.write(path, content2);
}
} catch (JAXBException e) {
e.printStackTrace();
out.println(e.getMessage() + " at processRawNews");
StackTraceElement[] st = e.getStackTrace();
for (int i = 0 ; i < st.length ; i ++){
out.println(st[i].toString());
}
}
我尝试了很多方法:
答案 0 :(得分:2)
您无法读取该文件,因为它在UTF-16
中编码,您可以通过BOM的第一个字符来判断。 0xff
是UTF-16
的BOM的一部分。因此,阅读时只需添加encoding='utf16'
(或在{python2中使用codecs.open
{/ 1}}