提取字符串之间的特定文本

时间:2020-04-25 15:36:07

标签: python list for-loop

我试图从.txt文件中提取对应于7个特定设备(0-6)的特定行,然后对该数据进行操作。

这里是一个示例:

我从一个很大的文件中提取了一个事件(此处为169139),其中包含来自7个设备中的6个的信息(此处仅为1,2,3,4,5,6,因为设备0没有数据)。对于每个此类事件,我都不知道是先验的,有多少设备将使活动设备作为其输出。可以是全部,也可以不是,也可以是一些。

=== 169139 ===
Start: 4.80374e+19
End:   4.80374e+19
--- 1 ---
Pix 9, 66
--- 2 ---
Pix 11, 31
Pix 12, 31
--- 3 ---
Pix 17, 53
Pix 16, 53
Pix 16, 54
--- 4 ---
Pix 44, 64
--- 5 ---
Pix 49, 133
Pix 48, 133
--- 6 ---
Pix 109, 143
Pix 108, 143
Pix 108, 144 
Pix 109, 144 

这些事件很容易迭代,我可以在屏幕上选择所有信息,直到下一个(这里,.txt中的下一行将是=== 169140 ===)。

我能够使用以下代码从特定设备中提取信息:

def start_stop_plane (list, dev):
    start_reading = [i for i in range(len(list)) if list[i] == "--- " + str(dev) + " ---"][0]
    stop_reading = [i for i in range(len(list)) if list[i] == "--- " + str(int(dev)+1) + " ---"][0]
    return list[start_reading:stop_reading]

在这里,列表是第一个代码注释(完整事件)。它是一个与上面的代码类似的列表,将 --- === 字符串出现(即事件之间的标记)交换。

我的问题:这适用于从0到5的所有内容。对于6,它崩溃了,因为没有int(dev)+1。我尝试将or放在stop_reading中以识别出现的===,但是它不起作用。

在这种情况下,我如何发信号通知列表结尾并确保我不会丢失任何设备?

2 个答案:

答案 0 :(得分:1)

您应该准备“ ---平面---”标记,然后让python使用in.index之类的基本功能为您找到。

要获取到下一个标记的数据行子集,可以使用itertools中的takewhile

data="""=== 169139 ===
Start: 4.80374e+19
End:   4.80374e+19
--- 1 ---
Pix 9, 66
--- 2 ---
Pix 11, 31
Pix 12, 31
--- 3 ---
Pix 17, 53
Pix 16, 53
Pix 16, 54
--- 4 ---
Pix 44, 64
--- 5 ---
Pix 49, 133
Pix 48, 133
--- 6 ---
Pix 109, 143
Pix 108, 143
Pix 108, 144 
Pix 109, 144""".split("\n")

from itertools import takewhile
def planeData(data,plane):
    marker = f"--- {plane} ---"
    if marker not in data: return []
    start = data.index(marker)+1
    return list(takewhile(lambda d:not d.startswith("---"),data[start:]))

输出:

for line in planeData(data,0): print(line)
# nothing printed

for line in planeData(data,5): print(line)
# Pix 49, 133
# Pix 48, 133

for line in planeData(data,6): print(line)
# Pix 49, 133
# Pix 48, 133
# Pix 109, 143
# Pix 108, 143
# Pix 108, 144 
# Pix 109, 144

答案 1 :(得分:1)

您可以使用字符串Index

代码

def start_stop_dev(lst, dev):
    " Assume you meant dev rather than plane "
    try:
      start_reading = lst.index("--- " + str(dev) + " ---")
    except:
      return ""   # No device

    try:
      stop_reading = lst.index("--- " + str(dev+1) + " ---") - 1
    except:
      stop_reading = len(lst)

    if start_reading:
        return lst[start_reading:stop_reading]
    else:
      return None  # not really possible since return "" earlier

测试

lst= """=== 169139 ===
Start: 4.80374e+19
End:   4.80374e+19
--- 1 ---
Pix 9, 66
--- 2 ---
Pix 11, 31
Pix 12, 31
--- 3 ---
Pix 17, 53
Pix 16, 53
Pix 16, 54
--- 4 ---
Pix 44, 64
--- 5 ---
Pix 49, 133
Pix 48, 133
--- 6 ---
Pix 109, 143
Pix 108, 143
Pix 108, 144 
Pix 109, 144"""

# Retrieve and print data for each device
print('----------------Individual Device String Info-------------')
for dev in range(7):
  print(f'device {dev}\n{start_stop_dev(lst, dev)}')

print('----------------Splits of String Info----------------------')
for dev in range(7):
  dev_lst = start_stop_dev(lst,dev).split("\n")
  print(f'dev {dev}: {dev_lst}')

输出 ----------------各个设备字符串信息-------------

device 0

device 1
--- 1 ---
Pix 9, 66
device 2
--- 2 ---
Pix 11, 31
Pix 12, 31
device 3
--- 3 ---
Pix 17, 53
Pix 16, 53
Pix 16, 54
device 4
--- 4 ---
Pix 44, 64
device 5
--- 5 ---
Pix 49, 133
Pix 48, 133
device 6
--- 6 ---
Pix 109, 143
Pix 108, 143
Pix 108, 144 
Pix 109, 144
----------------Splits of String Info----------------------
dev 0: ['']
dev 1: ['--- 1 ---', 'Pix 9, 66']
dev 2: ['--- 2 ---', 'Pix 11, 31', 'Pix 12, 31']
dev 3: ['--- 3 ---', 'Pix 17, 53', 'Pix 16, 53', 'Pix 16, 54']
dev 4: ['--- 4 ---', 'Pix 44, 64']
dev 5: ['--- 5 ---', 'Pix 49, 133', 'Pix 48, 133']
dev 6: ['--- 6 ---', 'Pix 109, 143', 'Pix 108, 143', 'Pix 108, 144 ', 'Pix 109, 144']