计算csv文件中重复的元素

时间:2017-08-01 16:17:37

标签: python csv collections count

这是我的csv文件:

2017-07-14  03:05:23    B2KPRT320   - Error1
2017-07-14  03:05:23    B2KPRT320   - Error1
2017-07-15  03:05:23    B2KPRT320   - Error2
2017-07-15  03:05:23    B2KPRT320   - Error3

我需要计算每天的错误

到目前为止这是我的脚本:

import collections
Data = []
string = ""
array = []
with open('out.csv') as f:
    for line in f:
        Data.append([word for word in line.strip().split("\t")])

for item in Data:
    try:
        date,error = item[0],item[3]
        string = date + "\t" + error + "\n"
        array.append([word for word in string.strip().split("\t")])
    except IndexError:
        print "A line in the file doesn't have enough entries."

最后,我需要将结果保存在另一个csv文件中 这个输出:

2017-07-14   - Error1   2
2017-07-15   - Error2   1
2017-07-15   - Error3   1

1 个答案:

答案 0 :(得分:0)

您可以将文件读入列表并使用collections.Counter()计算重复错误,然后split()每行获取第1个和最后一个项目。例如:

import collections
Data = []
string = ""
array = []
with open('test.txt') as f:
    Data = collections.Counter(f.read().splitlines())

for item, c in Data.items():
    item = item.split()
    date, error = item[0], item[-1]
    string = "{}\t{}\t{}".format(date, error, c)
    array.append(string)


for elem in array:
    print elem

这将输出:

2017-07-15  Error3  1
2017-07-15  Error2  1
2017-07-14  Error1  2

修改

您不再需要try/except,因为使用item[-1]会为您提供列表的最后一项。相反,您可以使用:

if len(item) < x:
    # print error
else:
    # the above code