Question

我使用open（）来读取日志文件，但我的内容很奇怪。如果我通过Notepad ++打开日志文件，复制内容并将其粘贴到新文件中，将其保存为.txt文件，open（）可以读取正确的内容。代码是：

public void processRawNews(String rootPath,String filename) throws Exception {

    try {
        //@debug 
        out.println("debug 1= ");
        JAXBContext jaxbContext = JAXBContext.newInstance(newsMLObj.class);
        out.println("debug 2= " + jaxbContext);
        SAXParserFactory spf = SAXParserFactory.newInstance();
        out.println("debug 3= " + spf);
        XMLReader xr = spf.newSAXParser().getXMLReader();

        // to bypass XML DocType and Entity as Jap did not provide proper XML
        xr.setFeature("http://xml.org/sax/features/validation", false);
        xr.setFeature("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false);
        xr.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
        xr.setFeature("http://xml.org/sax/features/external-general-entities", false);
        xr.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
        xr.setFeature("http://xml.org/sax/features/use-entity-resolver2", false);
        out.println("debug 4= " + xr);
        // remove UTF-8-Bom
        Path path = Paths.get(rootPath + filename);
        out.println("debug 5= " + path);
        byte[] xmlcontent = Files.readAllBytes(path);
        String s = new String(xmlcontent, StandardCharsets.UTF_8);
        s = s.replaceFirst("^\uFEFF", "");
        byte[] content2 = s.getBytes(StandardCharsets.UTF_8);

        if (content2.length != xmlcontent.length) {
            Files.write(path, content2);
        }
    } catch (JAXBException e) {
        e.printStackTrace();
        out.println(e.getMessage() + " at processRawNews");
        StackTraceElement[] st = e.getStackTrace();
        for (int i = 0 ; i < st.length ; i ++){
            out.println(st[i].toString());
        }
    }

我尝试了很多方法：

捕获日志文件并重定向到新的文本文件，没有帮助
在Notepad ++中打开日志文件，另存为...新文本文件，无需帮助
使用linux tail命令，将输出重定向到新的文本文件，没有帮助
使用python编解码器将其读取为utf-8，发生错误“python编解码器无法解码位置0中的字节0xff：无效的起始字节”
在Notepad ++中打开日志文件，复制其内容，粘贴到新文件，保存到新文本文件，它可以正常工作。

Answer 1

您无法读取该文件，因为它在UTF-16中编码，您可以通过BOM的第一个字符来判断。 0xff是UTF-16的BOM的一部分。因此，阅读时只需添加encoding='utf16'（或在{python2中使用codecs.open {/ 1}}

Python无法正确读取日志文件，除非我将其内容粘贴到新的文本文件中

1 个答案: