我正在搜索文本文件(下面显示的示例文件),我正在尝试从每个组中选择各种数据。例如,我想从每个组中选择“random text-x”数据,“findMe”文本以及每个组中的数字(例如来自A组“100-012”和“499-217”),尽管计数每组中的数字是未知的。
EXAMPLEFILE.TXT:
1. [**] random text group A [**]
2. random number of lines of text
3. findMe
4. stufff...100-012 qwerty...499-217
5. [**] random text group B [**]
6. random lines of text
7. findMe
8. [**] random text group C [**]
9. random number of lines of text
10. findMe
11. stufff...223-300 qwerty...888-888 zzzz...333-444
12. [**] continues......
我的代码显示在最后,但它只输出:
random text-A
findMe
['100-012', '499-217', '223-300', '888-888', '333-444']
我真的很难弄明白我哪里出错了。非常感谢任何帮助,谢谢。
import re
def patternMatching(group, line):
section = re.findall(group, line)
for i in section:
randText = re.search('\]\s(.*?)\[', i)
result1 = randText.group(1)
print result1
findMe = re.search('findMe', line)
result2 = findMe.group()
print result2
numbers = re.findall('(\d{3}\-\d{3})',line)
print numbers
randomTextgroup = re.compile(r'\*{2}\].*\[\*{2}\].*\[\*{2}\]', re.DOTALL|re.S)
with open ("C:/Location/test.txt", 'r') as txt:
data=txt.read().replace('\n','\r')
a = randomTextgroup.findall(data)
for i in a:
patternMatching(randomTextgroup, i)
我的目标是:
random text group A
findMe
100-012 499-217
random text group B
findMe
random text group C
fineMe
223-300 888-888 333-444
对于这组数字,我并不介意它是否以['223-300','888-888','333-444']或作为元组出现 - 就像它一样长分组,以便我可以使用它。
答案 0 :(得分:0)
我终于得到了! :)感谢@AdamSmith和@sln的帮助和建议。事实上,主要因素是正则表达式(@ randomTextgroup)是贪婪的。然后,patternMatching()中的额外for循环导致没有数据......一个for循环太多。无论如何,谢谢你:)
randomTextgroup = re.compile(r'\*{2}\].*?\[\*{2}\].*?(?=\[\*{2}\])')
with open ("C:Location/test.txt", 'r') as txt:
data=txt.read().replace('\n','\r')
section = randomTextgroup.findall(data)
for i in section:
randText = re.search('\]\s(.*?)\[', i)
test = randText.group(1)
print test
findMe = re.search('findMe', i)
result2 = findMe.group()
print result2
numbers = re.findall('(\d{3}\-\d{3})', i)
print numbers, '\n'
这导致以下输出:
random text-A
findMe
['100-012', '499-217']
random text-B
findMe
[]
random text-C
findMe
['223-300', '888-888', '333-444']
N.B。对于其他读者,输入文件保持不变(如原始帖子中所述)