Question

我正在尝试从特定字符串后的行中提取值。

文本文件行如下：

directory, batch: xxx  Date: xxxxxx xx:xx Pulp: type

AAAAAAAA
bbbbbbbb
cccccccc
dddddddd
eeeeeeee

我需要将'Pulp：type'添加到我的列表output[f]并附加该行。构成纸浆类型的字符数量从3到25个字符不等。

这就是我目前所拥有的：

for f in file_list:
txtfile = open(f, 'r')
output[f] = []
for line in txtfile:
    if 'batch' in line:    #only identifier for line is 'batch'
       # What Goes Here??

for i,line in enumerate(txtfile):
    if i == 4:
        output[f].append(line)
    elif i == 5:
        output[f].append(line)

我不知道如何从线上提取我需要的东西。有什么想法吗？

Answer 1

使用正则表达式：

import re
a = "directory, batch: xxx  Date: xxxxxx xx:xx Pulp: type"
m = re.match('.+(Pulp.+$)', a)
my_type_string = m[1]
print(my_type_string)

打印：

Pulp: type

或：

import re

for f in file_list:
txtfile = open(f, 'r')
output[f] = []

for line in txtfile:
    m = re.match('.+batch:.+(Pulp.+$)', a)
    # if you just want the Type value, use the string
    # '.+batch:.+Pulp:(.+$)'
    if m:
        pulp_value = m[1]
        output[f].append(pulp_value)

for i,line in enumerate(txtfile):
    if i == 4:
        output[f].append(line)
    elif i == 5:
        output[f].append(line)

Answer 2

您可以使用str.find()检查子字符串索引位置的行。

假设“Pulp：value”是该行的最后一段，这就是：

start_pulp = line.find("Pulp:") # find the location
pulp_value = line[start_pulp:] # slice the string to get everything from the word "Pulp:" to the end of the line.

如果“Pulp：value”没有一直到达行的末尾，则可以在后面的空格中拆分后续字符串。

示例：

for line in txtfile:
    if "Pulp:" in line:
        start_pulp = line.find("Pulp:") # find the location
        pulp_value = line[start_pulp:]
        output[f].append(pulp_value)

或者你可以使用正则表达式 - 如果沿着这条路走下去，Todd W的答案是完全可以接受的。

在特定字符串后检索值

2 个答案: