我有以下文件(示例):
<b n="First">
<v n="1">Contents</v>
<v n="1">Contents</v>
<v n="1">Contents</v>
<v n="1">Contents</v>
<b n="Second">
<v n="1">Contents</v>
<v n="1">Contents</v>
<v n="1">Contents</v>
<v n="1">Contents</v>
<v n="1">Contents</v>
<b n="Third">
<v n="1">Contents</v>
我想要做的是打印出文件的行,具体取决于<b n>
值。例如,如果值为<b n="First">
,则每行都会打印出来,如下所示:
First: Contents
但是,我不确定每次经过另一条<b n="value">
行时如何更改该值。
到目前为止,我尝试过的唯一一个指向正确方向的是使用正则表达式来搜索我想要的值:pattern = '<b n="(.*)">'
我也尝试过以下代码;
for LINE in FILE:
VALUE = re.findall(pattern, LINE)
print("{}: {}".format(VALUE, LINE))
这打印出来:
['First']: <b n="First">
[]: <v n="1">Contents</v>
[]: <v n="1">Contents</v>
[]: <v n="1">Contents</v>
[]: <v n="1">Contents</v>
['Second']: <b n="Second">
[]: <v n="1">Contents</v>
[]: <v n="1">Contents</v>
[]: <v n="1">Contents</v>
[]: <v n="1">Contents</v>
[]: <v n="1">Contents</v>
['Third']: <b n="Third">
[]: <v n="1">Contents</v>
但我想要的输出更像这样;
First: Contents
First: Contents
First: Contents
First: Contents
Second: Contents
Second: Contents
Second: Contents
Second: Contents
Third: Contents
有人能指出我正确的方向来实现这一输出吗?
答案 0 :(得分:2)
你实际上非常接近。
这是一种接近你的方法:
""
<b
代码如下所示:
title = ""
for line in file:
match = re.match(r'<b n="([^"]*)">', line)
if match is not None:
title = match.group(1)
else:
match = re.search(r'>(\w*)</v>', line)
if match is not None:
content = match.group(1)
print("{}: {}".format(title, content))
答案 1 :(得分:0)
我发现只使用正则表达式的另一种方法:
<b n=\"(.*)\"|<.* n=\"(.*)\">(.*)?<.*$
但你必须把你的文字放在一行:
import re
regex = r"<b n=\"(.*)\"|<.* n=\"(.*)\">(.*)?<.*"
test_str = ("""<b n="First">
<v n="1">Contents</v>
<v n="1">Contents</v>
<v n="1">Contents</v>
<v n="1">Contents</v>
<b n="Second">
<v n="1">Contents</v>
<v n="1">Contents</v>
<v n="1">Contents</v>
<v n="1">Contents</v>
<v n="1">Contents</v>
<b n="Third">
<v n="1">Contents</v>""")
matches = re.findall(regex, test_str)
for a, b, c in matches:
if a:
name = a
if c:
print(name, ': ', c)