Question

我正在试图清理美巡赛的高尔夫数据。最初，我在一些洞的分数前面有空白区域。这就是一个独特的计数：

df["Hole Score"].value_counts()

Out[76]: 
4     566072
5     272074
3     218873
6      48596
 4     38306
 5     19728
2      17339
 3     15093
7       7750
        4232
 6      3011
8       1313
 2      1080
 7       389
9        369
10        66
 8        61
11        38
1         27
 9        20
Name: Hole Score, dtype: int64

我能够运行一个空白移除功能，摆脱了领先的空白区域。但是，我的计数值函数返回相同的频率计数：

df["Hole Score_2"].value_counts()
Out[74]: 
4     566072
5     272074
3     218873
6      48596
4      38306
5      19728
2      17339
3      15093
7       7750
        4232
6       3011
8       1313
2       1080
7        389
9        369
10        66
8         61
11        38
1         27
9         20
Name: Hole Score_2, dtype: int64

作为参考，这是我使用的辅助函数：

def remove_whitespace(x):
    try:
        x = "".join(x.split())

    except:
        pass
    return x

df["Hole Score_2"] = df["Hole Score"].apply(remove_whitespace)

我的问题：如何获得每个计数一个孔得分的唯一计数？什么可能导致重复计算？

python中的双重唯一计数

0 个答案: