我正在尝试确定是否可以访问当前所在元素周围的列表元素。我有一个很大的列表(超过20k行),我想查找字符串“名称”的每个实例。另外,我还希望每个“名称”元素周围有+/- 5个元素。因此,前5行和后5行。我正在使用的代码如下。
search_string = 'Name'
with open('test.txt', 'r') as infile, open ('textOut.txt','w') as outfile:
for line in infile:
if search_string in line:
outfile.writelines([line, next(infile), next(infile),
next(infile), next(infile), next(infile)])
在'Name'出现之后获取行很简单,但是在弄乱我之前弄清楚如何访问元素。有人有想法吗?
答案 0 :(得分:3)
2万行不是 太多,如果可以读取列表中的所有行,我们可以在找到匹配项的索引周围进行切片,如下所示:
with open('test.txt', 'r') as infile, open('textOut.txt','w') as outfile:
lines = [line.strip() for line in infile.readlines()]
n = len(lines)
for i in range(n):
if search_string in lines[i]:
start = max(0, i - 5)
end = min(n, i + 6)
outfile.writelines(lines[start:end])
答案 1 :(得分:2)
答案 2 :(得分:1)
您需要跟踪列表中当前位置的索引
类似这样:
# Read the file into list_of_lines
index = 0
while index < len(list_of_lines):
if list_of_lines[index] == 'Name':
print(list_of_lines[index - 1]) # This is the previous line
print(list_of_lines[index + 1]) # This is the next line
# And so on...
index += 1
答案 3 :(得分:1)
假设您将行存储在列表中
lines = ['line1', 'line2', 'line3', 'line4', 'line5', 'line6', 'line7', 'line8', 'line9']
您可以定义一个方法来返回按n个连续的元素分组的元素作为生成器:
def each_cons(iterable, n = 2):
if n < 2: n = 1
i, size = 0, len(iterable)
while i < size-n+1:
yield iterable[i:i+n]
i += 1
青少年,只需调用该方法即可。要显示我正在呼叫的内容列表,但您可以对其进行迭代:
lines_by_3_cons = each_cons(lines, 3) # or any number of lines, 5 in your case
print(list(lines_by_3_cons))
#=> [['line1', 'line2', 'line3'], ['line2', 'line3', 'line4'], ['line3', 'line4', 'line5'], ['line4', 'line5', 'line6'], ['line5', 'line6', 'line7'], ['line6', 'line7', 'line8'], ['line7', 'line8', 'line9']]
答案 4 :(得分:1)
我个人很喜欢这个问题。这里的所有人都是通过将整个文件存储到内存中来完成此操作的。我想我写了一个内存有效的代码。 在这里,检查一下!
myfile = open('infile.txt')
stack_print_moments = []
expression = 'MYEXPRESSION'
neighbourhood_size = 5
def print_stack(stack):
for line in stack:
print(line.strip())
print('-----')
current_stack = []
for index, line in enumerate(myfile):
current_stack.append(line)
if len(current_stack) > 2 * neighbourhood_size + 1:
current_stack.pop(0)
if expression in line:
stack_print_moments.append(index + neighbourhood_size)
if index in stack_print_moments:
print_stack(current_stack)
last_index = index
for index in range(last_index, last_index + neighbourhood_size + 1):
if index in stack_print_moments:
print_stack(current_stack)
current_stack.pop(0)
更多高级代码在这里:Github link