我不确定最好这么说,所以我会直接深入了解一个例子。
a bunch of lines we don't care about [...]
This is a nice line I can look for
This is the string I wish to extract
a bunch more lines we do't care about [...]
This line contains an integer 12345 related to the string above
more garbage [...]
但有时(我无法控制)订单被交换:
a bunch of lines we don't care about [...]
Here is another string I wish to extract
This is a nice line I can look for
a bunch more lines we do't care about [...]
This line contains an integer 67890 related to the string above
more garbage [...]
这两行("漂亮的行"和#34;我希望提取的字符串")总是相邻的,但顺序是不可预测的。包含整数的行是下面不一致的行数。 "好线"多次出现并且始终相同,并且提取(全局)的字符串和整数可以彼此相同或不同。
最终的想法是填充两个列表,一个包含字符串,另一个包含整数,这两个列表都按照它们的顺序进行排序,以便以后可以将它们用作键/值对。
我不知道该怎么做,或者即使它可能的是,写一个正则表达式,在目标行之后的OR之前找到该字符串???
在Python中执行此操作,顺便说一句。
思想?
编辑/补充:因此,我在上述示例文本中所期望的结果将是:
list1["This is the string I wish to extract", "Here is another string I wish to extract"]
list2[12345, 67890]
答案 0 :(得分:1)
一个好的策略可能是寻找"好的线条"然后搜索上方和下方的行。
请参阅以下(未经测试的)python psuedocode:
L1, L2 = [], []
lines = open("file.txt").readlines()
for i, line in enumerate(i, lines):
if 'nice line' in line:
before_line = lines[min(i-1, 0)]
after_line = lines[min(i+1, len(lines) - 1)]
# You can generalize the above to a few lines above and below
# Use regex to parse information from `before_line` and `after_line`
# and add it to the lists: L1, L2