从CSV中获取出现次数

时间:2016-12-16 02:44:29

标签: python sorting csv count counter

输出目标

Apache 2.0.44 (Linux) - 2
Cisco IOS - 4
Linux Kernel 2.4.20 - 1
Microsoft IIS 5.0 < 5.1 - 2

当前输出m.group(1):

Apache 2.0.44 (Linux)
Apache 2.0.44 (Linux)
Cisco IOS
Cisco IOS
Cisco IOS
Cisco IOS
Linux Kernel 2.4.20
Microsoft IIS 5.0 < 5.1
Microsoft IIS 5.0 < 5.1

我有一个CSV文件,我成功地从第3列(称为标题)抓取输出并从中删除了一些不需要的内容(我只想要那一行中的内容,并希望在删除后删除所有内容&#34; - &#34)

import sys, csv, operator, re

reader = csv.reader(open("test.csv"), delimiter=",")
sortedlist = sorted(reader, key=operator.itemgetter(2), reverse=False)
for id, path, title, date, author, platform, type, port in sortedlist:
     m = re.search(r'^(.*?)\-.*', title)
     if m:
        print m.group(1)

现在我需要m.group(1)的内容来删除重复但显示它发生的次数。使用Counter计算每个项目的每个字母......我不知所措。

3 个答案:

答案 0 :(得分:0)

不是打印m.group(1),而是将其附加到结果列表中。然后将Counter与该列表一起使用。

答案 1 :(得分:0)

import sys, csv, operator, re, collections

result = collections.Counter()
reader = csv.reader(open("test.csv"), delimiter=",")
sortedlist = sorted(reader, key=operator.itemgetter(2), reverse=False)
for id, path, title, date, author, platform, type, port in sortedlist:
     m = re.search(r'^(.*?)\-.*', title)
     if m:
        result[m.group(1)] += 1

for group, count in results.items():
    print('{} - {}'.format(group, count))

答案 2 :(得分:0)

我的答案与@Raymond Hettinger非常相似(他打败了我发布的答案),但我也修改了正则表达式并进行了一些其他修改:

from collections import Counter
import csv
import operator
import re
import sys

counter = Counter()
with open("occurrences.csv", 'rb') as csvfile:
    data = [row for row in csv.reader(csvfile, delimiter=",")]
    for id, path, title, date, author, platform, type, port in data:
        m = re.search(r'^(.*?)\s*\-.*', title)
        title = m.group(1) if m else title
        counter.update([title])

for title, count in sorted(counter.items()):
    print('{} - {}'.format(title, count))