编辑：谢谢大家！

Question

我试图弄清楚如何在数据框中输出First_Name列的频率；每行。到目前为止，我已经成功地做到了这一点，但我也想知道如何计算每行的NaN值和Non-NaN值。

下面是一个具有两列的数据框：First_Name和Favorite_Color。我想看看是否可以统计First_Name列。输出代码时，我只能得到Non-NaN值的计数。是否有一种方法还可以包含一个NaN值计数并将其保存到数据帧的一部分中？

import pandas as pd

d = 
{
'First_Name': ["Jared", "Lily", "Sarah", "Bill", "Bill", "Alfred", None], 
'Favorite_Color': ["Blue", "Blue", "Pink", "Red", "Yellow", "Orange", "Red"]
}

df = pd.DataFrame(data=d)

df['countNames'] = df.groupby('First_Name')['First_Name'].transform('count')

print(df)

我希望能同时获得NaN和非NaN值的计数，但是我只会得到非NaN值的计数。

编辑：谢谢大家！

我非常喜欢阅读每个人的答案，看到这么多不同的解决方案真的很有趣！我认为SH-SF的答案很好，因为它更容易理解，但确实需要使用numpy库作为答案。

Answer 1

IIUC，这应该满足您的需求。

nasum=df['First_Name'].isnull().sum()
df['countNames'] = df.groupby('First_Name')['First_Name'].transform('count').replace(np.nan,nasum)

或根据ALollz的建议，以下代码也将提供相同的结果

df['countNames'] = df.groupby('First_Name')['First_Name'].transform('count').fillna(nasum)

输入

First_Name  Favorite_Color
0   Jared   Blue
1   Lily    Blue
2   Sarah   Pink
3   Bill    Red
4   Bill    Yellow
5   Alfred  Orange
6   None    Red
7   None    Pink

输出

First_Name  Favorite_Color  countNames
0   Jared   Blue    1.0
1   Lily    Blue    1.0
2   Sarah   Pink    1.0
3   Bill    Red     2.0
4   Bill    Yellow  2.0
5   Alfred  Orange  1.0
6   None    Red     2.0
7   None    Pink    2.0

Answer 2

尝试：

df['countNames'] = df.fillna(-1).groupby('First_Name')['First_Name'].transform('count')

First_Name Favorite_Color  countNames
0      Jared           Blue           1
1       Lily           Blue           1
2      Sarah           Pink           1
3       Bill            Red           2
4       Bill         Yellow           2
5     Alfred         Orange           1
6       None            Red           1

Answer 3

一种“快速”的解决方法是将其转换为字符串：

import pandas as pd

d = {
'First_Name': ["Jared", "Lily", "Sarah", "Bill", "Bill", "Alfred", None], 
'Favorite_Color': ["Blue", "Blue", "Pink", "Red", "Yellow", "Orange", "Red"]}

df = pd.DataFrame(data=d)

df['First_Name'] = df['First_Name'].astype(str)



df['countNames'] = df.groupby('First_Name')['First_Name'].transform('count')

print(df)

  First_Name Favorite_Color  countNames
0      Jared           Blue           1
1       Lily           Blue           1
2      Sarah           Pink           1
3       Bill            Red           2
4       Bill         Yellow           2
5     Alfred         Orange           1
6       None            Red           1

Answer 4

这里不需要变换。只需在临时数据帧map上使用value_counts和df1，如下所示

df1 = df.astype(str)
df['countNames'] = df1['First_Name'].map(df1['First_Name'].value_counts())

Out[802]:
  First_Name Favorite_Color  countNames
0      Jared           Blue           1
1       Lily           Blue           1
2      Sarah           Pink           1
3       Bill            Red           2
4       Bill         Yellow           2
5     Alfred         Orange           1
6       None            Red           1

用熊猫计算每行的NaN

编辑：谢谢大家！

4 个答案: