我是python的新手,我正在尝试使用当前无效的代码从文本文件中提取两个标头之间的信息。
with open('toysystem.txt','r') as f:
start = '<Keywords>'
end = '</Keywords>'
i = 0
lines = f.readlines()
for line in lines:
if line == start:
keywords = lines[i+1]
i += 1
作为参考,文本文件如下所示:
<Keywords>
GTO
</Keywords>
关于代码可能出错的任何想法?或者可能采用不同的方法来解决这个问题?
谢谢!
答案 0 :(得分:1)
所以我们可以编写类似
的内容with open('toysystem.txt', 'r') as f:
start = '<Keywords>'
end = '</Keywords>'
keywords = []
for line in f:
if line.rstrip() == start:
break
for line in f:
if line.rstrip() == end:
break
keywords.append(line)
给我们
>>> keywords
['GTO\n']
如果您不需要在关键字末尾添加换行符 - 也可以删除它们:
with open('toysystem.txt', 'r') as f:
start = '<Keywords>'
end = '</Keywords>'
keywords = []
for line in f:
if line.rstrip() == start:
break
for line in f:
if line.rstrip() == end:
break
keywords.append(line.rstrip())
给出
>>> keywords
['GTO']
但在这种情况下,最好创建像
这样的剥离线generatorwith open('toysystem.txt', 'r') as f:
start = '<Keywords>'
end = '</Keywords>'
keywords = []
stripped_lines = (line.rstrip() for line in f)
for line in stripped_lines:
if line == start:
break
for line in stripped_lines:
if line == end:
break
keywords.append(line)
也是如此。
最后,如果您需要在脚本的下一部分中使用您的行,我们可以使用str.readlines
和剥离行生成器:
with open('test.txt', 'r') as f:
start = '<Keywords>'
end = '</Keywords>'
keywords = []
lines = f.readlines()
stripped_lines = (line.rstrip() for line in lines)
for line in stripped_lines:
if line.rstrip() == start:
break
for line in stripped_lines:
if line.rstrip() == end:
break
keywords.append(line.rstrip())
给我们
>>> lines
['<Keywords>\n', 'GTO\n', '</Keywords>\n']
>>> keywords
['GTO']
答案 1 :(得分:0)
使用Python re模块并使用正则表达式解析它?!
import re
with open('toysystem.txt','r') as f:
contents = f.read()
# will find all the expressions in the file and return a list of values inside the (). You can extend the expression according to your need.
keywords = re.findall(r'\<keywords\>\s*\n*\s*(.*?)\s*\n*\s*\<\/keywords\>')
print(keywords)
从您的文件将打印
['GTO']
有关正则表达式和python检查的更多信息Tutorialspoint ,For python3和Python2