Question

我想知道（在Python中）如何计算发生次数并比较不同电子表格中不同列的值。计数之后，我需要知道这些值是否满足条件，即如果第一张电子表格中的Ana（用户）在第二张电子表格中出现1次，在第三张电子表格中出现5次，我想将1加到变量X。

我是Python的新手，但是我尝试使用集合中的Counter之后获取.values（）。但是，我不确定在对Counter的结果进行迭代时是否考虑了实际值Ana。总而言之，我需要迭代一个电子表格中的每个元素，看看它的每个元素是否在第二个电子表格中出现一次，而在第三个电子表格中出现五次，如果发生这种情况，变量X将加一。

def XInputOutputs（）：

list1 = []
with open(file1, 'r') as fr:
    r = csv.reader(fr)
    for row in r:
        list1.append(row[1])
    number_of_occurrences_in_list_1 = Counter(list1)
    list1_ocurrences = number_of_occurrences_in_list_1.values()

list2 = []
with open(file2, 'r') as fr:
    r = csv.reader(fr)
    for row in r:
        list2.append(row[1])
    number_of_occurrences_in_list_2 = Counter(list2)
    list2_ocurrences = number_of_occurrences_in_list_2.values()

X = 0

for x,y in zip(list1_ocurrences, list2_ocurrences):
    if x == 1 and y == 5:
        X += 1

return X

我使用小型电子表格进行了测试，但这仅适用于预定值。如果Ana在100000行之后出现，则所有内容均损坏。我认为需要迭代每个值（Ana）并同时检查所有电子表格中的变量X和。

谢谢。

Answer 1

我在工作，所以我稍后才能写完整的答案。如果可以导入模块，建议您尝试使用pandas：这是一种真正有用的工具，可以快速有效地管理数据框。您可以使用

轻松导入.csv电子表格

import pandas as pd

df = pd.read_csv()

方法，然后执行几乎所有类型的操作。

看看这个答案吧：我没有多少时间来阅读它，但我希望它会有所帮助

what is the most efficient way of counting occurrences in pandas?

更新：然后尝试这个

# not tested but should work

import os
import pandas as pd

# read all csv sheets from folder - I assume your folder is named "CSVs"
for files in os.walk("CSVs"):
    files = files[-1]
# here it's generated a list of dataframes
df_list = []
for file in files:
    df = pd.read_csv("CSVs/" + file)
    df_list.append(df)

name_i_wanna_count = "" # this will be your query
columun_name = "" # here insert the column you wanna analyze
count = 0

for df in df_list:
    # retrieve a series matching your query and then counts the elements inside
    matching_serie = df.loc[df[columun_name] == name_i_wanna_count]
    partial_count = len(matching_serie)
    count = count + partial_count

print(count)

我希望对您有帮助

计算和比较不同电子表格中不同列的出现次数

1 个答案: