以CSV格式循环记录并识别第一列中的值(即行[0]),这些值在数据集(CSV)中至少出现4次并打印这些行(记录)。
数据示例:
address salesprice
1 RIVER TERRACE 6000000
10 LITTLE WEST STREET 2500000
15 WILLIAM STREET 1140000
15 WILLIAM STREET 885878
15 WILLIAM STREET 997885
15 WILLIAM STREET 1220881.75
120 GREENWICH STREET 625000
示例代码(失败)
import csv
from collections import Counter
with open('path/myfile.csv', 'r',newline='') as f:
myfile = csv.reader(f)
for row in myfile:
#print(row[0])
if Counter.items(row[0]) > 4:
print(row)
答案 0 :(得分:1)
您需要先计算读取器对象的一次完整传递,然后再次读取该文件:
import csv
from collections import Counter
from operator import itemgetter
with open('path/myfile.csv', 'r',newline='') as f:
r = csv.reader(f)
# get counts first
cn = Counter(map(itemgetter(0),r))
# reset pointer to beginning of file
f.seek(0)
# create another reader
r = csv.reader(f)
# now iterate over the rows again, checking count of each row[0]
for row in r:
if cn[row[0]] > 4:
print(row)
答案 1 :(得分:0)
首先,您必须阅读文件中的所有行。你在那个集合上构建你的计数器。
其次,您遍历计数器的键并检查值为> = 4.(请注意,您只执行> ...您必须包含4)。
这足以让你编写自己的代码吗?
答案 2 :(得分:0)
myfile.csv
import collections, csv
with open('myfile.csv', 'rb') as f:
rows = [x for x in csv.reader(f)]
count = collections.Counter([x[0] for i, x in enumerate(rows) if i > 0])
for row in rows:
if count.get(row[0], 0) > 3:
print(row)
码
<br/>
答案 3 :(得分:-1)
只是在python ....
from collections import Counter
import re
with open('data') as f:
lines = f.readlines()
# this will get as num - most common number, and cnt as count for it
num, cnt = Counter( [e.split(',',1)[0] for e in lines] ).most_common(1)[0]
if int(cnt) >= 4:
for line in lines:
if re.match(num,line):
print(line)
15,WILLIAM STREET,1140000
15,WILLIAM STREET,885878
15,WILLIAM STREET,997885
15,WILLIAM STREET,1220881.75