输出目标:
Apache 2.0.44 (Linux) - 2
Cisco IOS - 4
Linux Kernel 2.4.20 - 1
Microsoft IIS 5.0 < 5.1 - 2
当前输出m.group(1):
Apache 2.0.44 (Linux)
Apache 2.0.44 (Linux)
Cisco IOS
Cisco IOS
Cisco IOS
Cisco IOS
Linux Kernel 2.4.20
Microsoft IIS 5.0 < 5.1
Microsoft IIS 5.0 < 5.1
我有一个CSV文件,我成功地从第3列(称为标题)抓取输出并从中删除了一些不需要的内容(我只想要那一行中的内容,并希望在删除后删除所有内容&#34; - &#34)
import sys, csv, operator, re
reader = csv.reader(open("test.csv"), delimiter=",")
sortedlist = sorted(reader, key=operator.itemgetter(2), reverse=False)
for id, path, title, date, author, platform, type, port in sortedlist:
m = re.search(r'^(.*?)\-.*', title)
if m:
print m.group(1)
现在我需要m.group(1)
的内容来删除重复但显示它发生的次数。使用Counter
计算每个项目的每个字母......我不知所措。
答案 0 :(得分:0)
不是打印m.group(1),而是将其附加到结果列表中。然后将Counter
与该列表一起使用。
答案 1 :(得分:0)
import sys, csv, operator, re, collections
result = collections.Counter()
reader = csv.reader(open("test.csv"), delimiter=",")
sortedlist = sorted(reader, key=operator.itemgetter(2), reverse=False)
for id, path, title, date, author, platform, type, port in sortedlist:
m = re.search(r'^(.*?)\-.*', title)
if m:
result[m.group(1)] += 1
for group, count in results.items():
print('{} - {}'.format(group, count))
答案 2 :(得分:0)
我的答案与@Raymond Hettinger非常相似(他打败了我发布的答案),但我也修改了正则表达式并进行了一些其他修改:
from collections import Counter
import csv
import operator
import re
import sys
counter = Counter()
with open("occurrences.csv", 'rb') as csvfile:
data = [row for row in csv.reader(csvfile, delimiter=",")]
for id, path, title, date, author, platform, type, port in data:
m = re.search(r'^(.*?)\s*\-.*', title)
title = m.group(1) if m else title
counter.update([title])
for title, count in sorted(counter.items()):
print('{} - {}'.format(title, count))