Question

我不确定最好这么说，所以我会直接深入了解一个例子。

a bunch of lines we don't care about [...]
This is a nice line I can look for
This is the string I wish to extract
a bunch more lines we do't care about [...]
This line contains an integer 12345 related to the string above
more garbage [...]

但有时（我无法控制）订单被交换：

a bunch of lines we don't care about [...]
Here is another string I wish to extract
This is a nice line I can look for
a bunch more lines we do't care about [...]
This line contains an integer 67890 related to the string above
more garbage [...]

这两行（＆＃34;漂亮的行＆＃34;和＃34;我希望提取的字符串＆＃34;）总是相邻的，但顺序是不可预测的。包含整数的行是下面不一致的行数。＆＃34;好线＆＃34;多次出现并且始终相同，并且提取（全局）的字符串和整数可以彼此相同或不同。

最终的想法是填充两个列表，一个包含字符串，另一个包含整数，这两个列表都按照它们的顺序进行排序，以便以后可以将它们用作键/值对。

我不知道该怎么做，或者即使它可能的是，写一个正则表达式，在目标行之后的OR之前找到该字符串???

在Python中执行此操作，顺便说一句。

思想？

编辑/补充：因此，我在上述示例文本中所期望的结果将是：

list1["This is the string I wish to extract", "Here is another string I wish to extract"]
list2[12345, 67890]

Answer 1

一个好的策略可能是寻找＆＃34;好的线条＆＃34;然后搜索上方和下方的行。

请参阅以下（未经测试的）python psuedocode：

L1, L2 = [], []
lines = open("file.txt").readlines()
for i, line in enumerate(i, lines):
    if 'nice line' in line:
       before_line = lines[min(i-1, 0)]
       after_line = lines[min(i+1, len(lines) - 1)]
       # You can generalize the above to a few lines above and below

       # Use regex to parse information from `before_line` and `after_line`
       # and add it to the lists: L1, L2

正则表达式在字符串中查找字符串而不考虑订单？

1 个答案: