我有一个CSV,其中第6列表示该班级中学生人数的计数。我还有一段单独的代码,如果它们出现在不同的剧本中,会从课程中删除一些学生,我将如何重新计算每个班级的学生人数。请参阅以下示例数据:
Jan-20,Data,Class xpv,4,11yo+,4,more data....
Jan-20,Data,Class xpv,4,11yo+,4,more data....
Jan-20,Data,Class xpv,4,11yo+,4,more data....
Jan-20,Data,Class xpv,4,11yo+,4,more data....
Jan-30,Data,Class tn2,4,10yo+,12,more data....
Jan-30,Data,Class tn2,4,10yo+,12,more data....
Jan-30,Data,Class tn2,4,10yo+,12,more data....
Jan-30,Data,Class tn2,4,10yo+,12,more data....
Jan-30,Data,Class tn2,4,10yo+,12,more data....
Jan-30,Data,Class tn2,4,10yo+,12,more data....
Jan-30,Data,Class tn2,4,10yo+,12,more data....
Jan-30,Data,Class tn2,4,10yo+,12,more data....
Jan-30,Data,Class tn2,4,10yo+,12,more data....
Jan-30,Data,Class tn2,4,10yo+,12,more data....
Jan-30,Data,Class tn2,4,10yo+,12,more data....
Jan-30,Data,Class tn2,4,10yo+,12,more data....
Jan-50,Data,Class 22zn,2,10yo+,6,more data....
Jan-50,Data,Class 22zn,2,10yo+,6,more data....
Jan-50,Data,Class 22zn,2,10yo+,6,more data....
Jan-50,Data,Class 22zn,2,10yo+,6,more data....
Jan-50,Data,Class 22zn,2,10yo+,6,more data....
Jan-50,Data,Class 22zn,2,10yo+,6,more data....
标识要删除哪些行的列在“更多数据”中结束但是在删除任何一行时,如何编码以计算该类中剩余的学生数,主要是计数第2列并替换值第6列。(这些类名称都是唯一的)
我希望这是有道理的。感谢任何帮助!亲切的问候AEA
修改 将上述数据保存为AEAtest.csv
我尝试运行以下代码:
import csv
import itertools
from operator import itemgetter
import random
def some_condition(line):
return random.random() < 0.5 # delete lines randomly with 50% probability
def filter_data(data):
for classname, group in itertools.groupby(data, itemgetter(2)):
filtered_group = [line for line in group if some_condition(line)]
new_sum = len(filtered_group)
for line in filtered_group:
line[5] = new_sum
yield line
with open('C:\AEAtest.csv') as f_in, open('C:\AEAtest_MOD.csv', 'w') as f_out:
reader = csv.reader(f_in)
writer = csv.writer(f_out)
writer.writerows(filter_data(reader))
输出如下:
Jan-20,Data,Class xpv,4,11yo+,2,more data....
Jan-20,Data,Class xpv,4,11yo+,2,more data....
Jan-30,Data,Class tn2,4,10yo+,7,more data....
Jan-30,Data,Class tn2,4,10yo+,7,more data....
Jan-30,Data,Class tn2,4,10yo+,7,more data....
Jan-30,Data,Class tn2,4,10yo+,7,more data....
Jan-30,Data,Class tn2,4,10yo+,7,more data....
Jan-30,Data,Class tn2,4,10yo+,7,more data....
Jan-30,Data,Class tn2,4,10yo+,7,more data....
Jan-50,Data,Class 22zn,2,10yo+,3,more data....
Jan-50,Data,Class 22zn,2,10yo+,3,more data....
Jan-50,Data,Class 22zn,2,10yo+,3,more data....
我想知道额外的线条现在是如何发生的,有趣的是上面的最后一行文字是第23行,接着是另外两行空白。
有关修复此错误的任何帮助?亲切的问候AEA
答案 0 :(得分:5)
我认为你可以在你的csv数据上使用itertools.groupby
,按类名分组。然后,当您遍历每个组时,如果已删除任何行,则可以更正计数。
from itertools import groupby
from operator import itemgetter
def filter_data(data):
for classname, group in itertools.groupby(data, itemgetter(2)):
filtered_group = [line for line in group if some_condition(line)]
new_count = len(filtered_group)
for line in filtered_group:
line[5] = new_count
yield line
给定some_condition
函数,以下是如何使用它来打印过滤后的数据:
import csv
import random
def some_condition(line):
return random.random() < 0.5 # delete lines randomly with 50% probability
data = """Jan-20,Data,Class xpv,4,11yo+,4,more data....
Jan-20,Data,Class xpv,4,11yo+,4,more data....
Jan-20,Data,Class xpv,4,11yo+,4,more data....
Jan-20,Data,Class xpv,4,11yo+,4,more data....
Jan-30,Data,Class tn2,4,10yo+,12,more data....
Jan-30,Data,Class tn2,4,10yo+,12,more data....
Jan-30,Data,Class tn2,4,10yo+,12,more data....
Jan-30,Data,Class tn2,4,10yo+,12,more data....
Jan-30,Data,Class tn2,4,10yo+,12,more data....
Jan-30,Data,Class tn2,4,10yo+,12,more data....
Jan-30,Data,Class tn2,4,10yo+,12,more data....
Jan-30,Data,Class tn2,4,10yo+,12,more data....
Jan-30,Data,Class tn2,4,10yo+,12,more data....
Jan-30,Data,Class tn2,4,10yo+,12,more data....
Jan-30,Data,Class tn2,4,10yo+,12,more data....
Jan-30,Data,Class tn2,4,10yo+,12,more data....
Jan-50,Data,Class 22zn,2,10yo+,6,more data....
Jan-50,Data,Class 22zn,2,10yo+,6,more data....
Jan-50,Data,Class 22zn,2,10yo+,6,more data....
Jan-50,Data,Class 22zn,2,10yo+,6,more data....
Jan-50,Data,Class 22zn,2,10yo+,6,more data....
Jan-50,Data,Class 22zn,2,10yo+,6,more data....""".splitlines()
for line in filter_data(csv.reader(data)):
print(line)
您可能希望阅读和编写实际文件,而不是解析字符串并打印修改后的结果。这是一些(未经测试的)代码,显示了如何做到这一点:
with open('myfile.csv', 'rb') as f_in, open('myfile_filtered.csv', 'wb') as f_out:
reader = csv.reader(f_in)
writer = csv.writer(f_out)
writer.writerows(filter_data(reader))
请注意,在Python 3中,文件应该以文本模式而不是二进制模式打开,但是您还需要传递额外的参数newline=""
,以便让csv
模块处理该行结局本身。