我是python的新手,需要帮助。 我有一个文件,想要将文本提取到另一个文件。
输入文件如下所示:
<Datei Kennung="4bc78" Titel="Morgen 1" Bereich="I847YP"> Morgen 1
Here is text, contains numbers and text.
Here is text, contains numbers and text.
Here is text, contains numbers and text.
Here is text, contains numbers and text.
Here is text, contains numbers and text.
</Datei>
<Datei Kennung="469" Titel="Trop Hall W " Bereich="izr"> Trop Hall W
Here is text, contains numbers and text.
Here is text, contains numbers and text.
</Datei>
对于我文件中的第一个区域,我需要输出文件Morgen 1.txt 其中包含:
Morgen 1
Here is text, contains numbers and text.
Here is text, contains numbers and text.
Here is text, contains numbers and text.
Here is text, contains numbers and text.
Here is text, contains numbers and text.
我从其他用户那里获得了这段代码:
import re
REG_PARSE=re.compile(r'<Datei[^>]*Titel="\s*([^"]*?)\s*"[^>]*>\s*\1\s*(.*?</Datei>',re.dotall)
with open(filename) as infile:
for outfilename, text = REG_PARSE.finditer(infile.read()):
with open('%s.txt'%outfilename,'w') as outf:
outf.write(text)
但它不起作用
答案 0 :(得分:0)
看看这是否适合你:
#!/usr/bin/env python
#-*- coding:utf-8 -*-
from xml.dom import minidom
xmldoc = minidom.parse('/path/to/file')
items = xmldoc.getElementsByTagName('Datei')
for s in items:
if s.attributes['Titel'].value == "Morgen 1":
with open("Morgen 1.txt", "w") as fileOutput:
listLines = [ line.strip()
for line in s.firstChild.nodeValue.strip().split("\n")
if line.strip()
]
fileOutput.write("\n".join(listLines))
break
答案 1 :(得分:-1)
如果你想要一个快速而肮脏的方法来做这个,不使用xml(推荐),这将完成这项工作:
with open('path/to/input') as infile:
found = False
outfile = open("Morgen 1.txt", 'w')
for line in infile:
if line.startswith("<Datei") and 'Titel="Morgen 1"' in line:
found = True
elif line.startswith("</Datei"):
found = False
if found:
if not line.startswith("<Datei"):
outfile.write(line)
答案 2 :(得分:-2)
尝试一下......它有效......
fp = open("data.txt", "r")
data = fp.read();
data = data.split(">");
i = 0;
while True:
filename = data[i].split('" ')[1].split('"')[1]
text = data[i+1].split('<')[0].strip()
fp1 = open(filename + ".txt", "w")
fp1.write(text)
fp1.close()
i += 2
if i >= (len(data) - 1):
break;