Question

以CSV格式循环记录并识别第一列中的值（即行[0]），这些值在数据集（CSV）中至少出现4次并打印这些行（记录）。

数据示例：

address                     salesprice
1 RIVER TERRACE             6000000
10 LITTLE WEST STREET       2500000
15 WILLIAM STREET           1140000
15 WILLIAM STREET           885878
15 WILLIAM STREET           997885
15 WILLIAM STREET           1220881.75
120 GREENWICH STREET        625000

示例代码（失败）

import csv
from collections import Counter

with open('path/myfile.csv', 'r',newline='') as f:
    myfile = csv.reader(f)
    for row in myfile:
        #print(row[0])
        if Counter.items(row[0]) > 4:
            print(row)

Answer 1

您需要先计算读取器对象的一次完整传递，然后再次读取该文件：

import csv
from collections import Counter
from operator import itemgetter

with open('path/myfile.csv', 'r',newline='') as f:
    r = csv.reader(f)
    # get counts first
    cn = Counter(map(itemgetter(0),r))
    # reset pointer to beginning of file
    f.seek(0)
    # create another reader
    r = csv.reader(f)
    # now iterate over the rows again, checking count of each row[0]
    for row in r:
        if cn[row[0]] > 4:
            print(row)

Answer 2

首先，您必须阅读文件中的所有行。你在那个集合上构建你的计数器。

其次，您遍历计数器的键并检查值为＆gt; = 4.（请注意，您只执行＆gt; ...您必须包含4）。

这足以让你编写自己的代码吗？

Answer 3

myfile.csv

import collections, csv

with open('myfile.csv', 'rb') as f:
    rows = [x for x in csv.reader(f)]
    count = collections.Counter([x[0] for i, x in enumerate(rows) if i > 0])
    for row in rows:
        if count.get(row[0], 0) > 3:
            print(row)

码

<br/>

Answer 4

只是在python ....

from collections import Counter
import  re

with open('data') as f:
     lines = f.readlines()

# this will get as num - most common number, and cnt as count for it
num, cnt = Counter( [e.split(',',1)[0] for e in lines] ).most_common(1)[0]

if int(cnt) >= 4:
   for line in lines:
       if re.match(num,line):
           print(line)

15,WILLIAM STREET,1140000

15,WILLIAM STREET,885878

15,WILLIAM STREET,997885

15,WILLIAM STREET,1220881.75

识别出现多次的列中的值

4 个答案: