Question

我正在尝试使用Python在csv文件的行上实现滑动/移动窗口方法。每行可以包含一列，该列的二进制值为yes或no。基本上，我想少见yes的声音。这意味着如果说我们在5个窗口（最多5个）（最多5个）中有 3 yes行，请保留它们。但是，如果有1或2，我们将它们更改为no。我该怎么办？

例如，后面的yes应该都变成no。

... 1,a1,b1,no,0.75 2,a2,b2,no,0.45 3,a3,b3,yes,0.98 4,a4,b4,yes,0.22 5,a5,b5,no,0.46 6,a6,b6,no,0.20 ...

但是在下面的内容中，我们保持不变（可能有一个5个窗口，其中3个是yes）：

... 1,a1,b1,no,0.75 2,a2,b2,no,0.45 3,a3,b3,yes,0.98 4,a4,b4,yes,0.22 5,a5,b5,no,0.46 6,a6,b6,yes,0.20 ...

我试图写一个窗口为5的东西，但是卡住了（它不完整）：

window_size = 5 filename='C:\\Users\\username\\v3\\And-'+v3file.split("\\")[5] with open(filename) as fin: with open('C:\\Users\\username\\v4\\And2-'+v3file.split("\\")[5],'w') as finalout: line= fin.readline() index = 0 sequence= [] accs=[] while line: print(line) for i in range(window_size): line = fin.readline() sequence.append(line) index = index + 1 fin.seek(index)

Answer 1

您可以在将collections.deque参数设置为所需窗口大小的情况下使用maxlen，以实现滑动窗口，该窗口跟踪最近5行的是/否标志。保持yes计数，而不是在每次迭代中在滑动窗口中计算yes的总和，以提高效率。当您有一个全尺寸的滑动窗口且yes的计数大于2时，请将这些yes的行索引添加到应按原样保留yes的集合中。并在重置输入的文件指针后的第二遍中，如果行索引不在集合中，则将yes更改为noes：

from collections import deque

window_size = 5
with open(filename) as fin, open(output_filename, 'w') as finalout:
    yeses = 0
    window = deque(maxlen=5)
    preserved = set()
    for index, line in enumerate(fin):
        window.append('yes' in line)
        if window[-1]:
            yeses += 1
        if len(window) == window_size:
            if yeses > 2:
                preserved.update(i for i, f in enumerate(window, index - window_size + 1) if f)
            if window[0]:
                yeses -= 1
    fin.seek(0)
    for index, line in enumerate(fin):
        if index not in preserved:
            line = line.replace('yes', 'no')
        finalout.write(line)

演示：https://repl.it/@blhsing/StripedCleanCopyrightinfringement

Answer 2

这是基于建立连续列表理解的5列解决方案：

lines = [
'1,a1,b1,no,0.75',
'2,a2,b2,yes,0.45',
'3,a3,b3,yes,0.98',
'4,a4,b4,yes,0.22',
'5,a5,b5,no,0.46',
'6,a6,b6,no,0.98',
'7,a7,b7,yes,0.22',
'8,a8,b8,no,0.46',
'9,a9,b9,no,0.20']

n = len(lines)

# flag all lines containing 'yes' (add 2 empty lines at boundaries to avoid pbs)
flags = [line.count('yes') for line in ['', '']+lines+['', '']]
# count number of flags in sliding window [p-2,p+2]
counts = [sum(flags[p-2:p+3]) for p in range(2,n+2)]
# tag lines that need to be changed
tags = [flag > 0 and count < 3 for (flag,count) in zip(flags[2:],counts)]
# change tagged lines
for n in range(n):
  if tags[n]: lines[n] = lines[n].replace('yes','no')

print(lines)

结果：

['1,a1,b1,no,0.75',
 '2,a2,b2,yes,0.45',
 '3,a3,b3,yes,0.98',
 '4,a4,b4,yes,0.22',
 '5,a5,b5,no,0.46',
 '6,a6,b6,no,0.98',
 '7,a7,b7,no,0.22',
 '8,a8,b8,no,0.46',
 '9,a9,b9,no,0.20']

编辑：从标准文本文件中读取数据时，您要做的就是：

with file(filename,'r') as f:
  lines = f.read().strip().split('\n')

（剥离以删除文件顶部或底部的潜在空白行，split（\ n）将文件内容转换为行列表），然后使用上面的代码...

在Python中的文件行上实现滑动窗口

2 个答案: