Question

我有3列用逗号分隔，如下所示（column1，column2，column3）。在下面的示例中，“241682-27638-USD-OCOF”不重复，因此计数为1，“241942-37190-USD-DIV”重复两次，因此计数为2，依此类推。

column1，column2，column3，occcurance_count_of_column3
name1，empId1,241682-27638-USD-CIGGNT，1
name2，empId2,241682-27638-USD-OCGGINT，1
name3，empId3,241942-37190-USD-GGDIV，2
name4，empId4,241942-37190-USD-CHYOF，1
name5，empId5,241942-37190-USD-EQPL，1
name6，empId6,241942-37190-USD-INT，1
name7，empId7,242066-15343-USD-CYJOF，3
name8，empId8,242066-15343-USD-CYJOF，3
name9，empId9,242066-15343-USD-CYJOF，3
name10，empId10,241942-37190-USD-GGDIV，2
name11，empId11,242066-33492-USD-CJHOF，1

我在CSV文件中有column1，column2，column3。我想将occcurance_count_of_column3作为下一列。我想检查column3中的元素是否重复以及重复次数（occcurance_count）。并使用Python在同一CSV文件中打印出现次数。

Answer 1

你需要一个柜台。可以从stdlib的Counter模块获得collections，但我们可以不用。您需要对数据进行两次传递，并且我认为您可以在列表中列出数据结构中的文件内容，我们可以方便地将其命名为table

counts = {}
for row in table:
    # use the `get` method of a dict with the optional `d` argument
    # set to 0 (see ">>> help(dict.get)" if get is new for you)
    counts[row[2]] = counts.get(row[2],0) + 1

for row in table:
    print formatter(row,counts[row[2]])

我想在它前面打印第3列的出现次数，如下所示

1 个答案: