Question

我在使用python时遇到问题，其中有一个非常大的日志文件，将近50MB，我需要在其中解析一些选定的数据。该文件如下所示，其中包含许多内容标签，这些内容标签未包含在父标签中。

<content>
<date>05/06/1993</date><incominhpoint>message</incomingpoint>
Message Properties:
Outgoing host:https://www.toppr.com/guides/english/transformation-sentences/reported-speech/
Properties:
Message:-----
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
</content>
<content>
<date>12022019</date>
<name>sreeja</name>
</content>

根据用户输入的内容节点中将出现的某些值，我需要在单独的文本文件中单独解析该节点。例如：如果用户输入为“ Jani”，则我可能在不同的内容节点中出现10到8次Jani，因此所有这些内容节点都应在不同的文本文件中分别进行解析。

已尝试的方法： 1.）将整个日志文件包含在“ root”元素中，然后进行传递和提取-此操作失败，并且似乎不是可行的解决方案，因为这会在重新写入和读取文件时占用大量内存。 2）我能够读取整个文件并获取搜索实例的行号，但是在这种情况下，我无法返回到前几行并开始读取并解析该特定内容。 3.）文件搜索和告诉方法也无法解决，因为它们正在计算偏移值。

此处尝试的代码如下：

from itertools import islice
lastiterline=none
line_num=0
search_phrase="Jani"
with open ('c:\sample.txt',"rb+") as f:
      for line in f:
          line_num+=1
     line=line.strip()
        if line.startswith("<content>"):
           lastiterline=line
           linec=line_num
        elif line find(search_phrase)>=0:
             if lastiterline:
             print line
             print linec

任何帮助将不胜感激

如何使用python

0 个答案: