Question

这是我的csv文件：

2017-07-14  03:05:23    B2KPRT320   - Error1
2017-07-14  03:05:23    B2KPRT320   - Error1
2017-07-15  03:05:23    B2KPRT320   - Error2
2017-07-15  03:05:23    B2KPRT320   - Error3

我需要计算每天的错误

到目前为止这是我的脚本：

import collections
Data = []
string = ""
array = []
with open('out.csv') as f:
    for line in f:
        Data.append([word for word in line.strip().split("\t")])

for item in Data:
    try:
        date,error = item[0],item[3]
        string = date + "\t" + error + "\n"
        array.append([word for word in string.strip().split("\t")])
    except IndexError:
        print "A line in the file doesn't have enough entries."

最后，我需要将结果保存在另一个csv文件中这个输出：

2017-07-14   - Error1   2
2017-07-15   - Error2   1
2017-07-15   - Error3   1

Answer 1

您可以将文件读入列表并使用collections.Counter()计算重复错误，然后split()每行获取第1个和最后一个项目。例如：

import collections
Data = []
string = ""
array = []
with open('test.txt') as f:
    Data = collections.Counter(f.read().splitlines())

for item, c in Data.items():
    item = item.split()
    date, error = item[0], item[-1]
    string = "{}\t{}\t{}".format(date, error, c)
    array.append(string)


for elem in array:
    print elem

这将输出：

2017-07-15  Error3  1
2017-07-15  Error2  1
2017-07-14  Error1  2

修改

您不再需要try/except，因为使用item[-1]会为您提供列表的最后一项。相反，您可以使用：

if len(item) < x: # print error else: # the above code

计算csv文件中重复的元素

1 个答案: