`pandas`手术但速度较慢

Question

我有一个pandas DataFrame。我试图根据Section栏中相应级别的平均价格填写价格列的nans。这样做有效而优雅的方法是什么？我的数据看起来像这样

Name   Sex  Section  Price
Joe     M      1       2
Bob     M      1       nan
Nancy   F      2       5
Grace   F      1       6
Jen     F      2       3
Paul    M      2       nan

Answer 1

您可以使用合并groupby，transform和mean。请注意，我已经修改了您的示例，因为否则两个Sections具有相同的平均值。从

开始

In [21]: df
Out[21]: 
    Name Sex  Section  Price
0    Joe   M        1    2.0
1    Bob   M        1    NaN
2  Nancy   F        2    5.0
3  Grace   F        1    6.0
4    Jen   F        2   10.0
5   Paul   M        2    NaN

我们可以使用

df["Price"] = (df["Price"].fillna(df.groupby("Section")["Price"].transform("mean"))

生产

In [23]: df
Out[23]: 
    Name Sex  Section  Price
0    Joe   M        1    2.0
1    Bob   M        1    4.0
2  Nancy   F        2    5.0
3  Grace   F        1    6.0
4    Jen   F        2   10.0
5   Paul   M        2    7.5

这是有效的，因为我们可以通过Section计算平均值：

In [29]: df.groupby("Section")["Price"].mean()
Out[29]: 
Section
1    4.0
2    7.5
Name: Price, dtype: float64

并将此广播回到我们可以使用transform传递给fillna（）的完整系列：

In [30]: df.groupby("Section")["Price"].transform("mean")
Out[30]: 
0    4.0
1    4.0
2    7.5
3    4.0
4    7.5
5    7.5
Name: Price, dtype: float64

Answer 2

`pandas`手术但速度较慢

请参阅@ DSM的答案以获得更快的`pandas`解决方案

这是一种更具手术性的方法，可能提供一些可能有用的视角

使用groupyby
- 为每个mean
  计算我们的Section
```
means = df.groupby('Section').Price.mean()
```
识别空值
- 使用isnull用于布尔切片
```
nulls = df.Price.isnull()
```
使用map
- 将Section列切片以限制为只有Price
  的行
```
fills = df.Section[nulls].map(means)
```
使用loc
- 仅在空值
  的情况下填写df中的位置
```
df.loc[nulls, 'Price'] = fills
```

一起

means = df.groupby('Section').Price.mean()
nulls = df.Price.isnull()
fills = df.Section[nulls].map(means)
df.loc[nulls, 'Price'] = fills

print(df)

    Name Sex  Section  Price
0    Joe   M        1    2.0
1    Bob   M        1    4.0
2  Nancy   F        2    5.0
3  Grace   F        1    6.0
4    Jen   F        2   10.0
5   Paul   M        2    7.5

Answer 3

通过“相应级别”我假设你的意思是具有相等的部分值。

如果是这样，你可以通过

来解决这个问题

for section_value in sorted(set(df.Section)):

    df.loc[df['Section']==section_value, 'Price'] = df.loc[df['Section']==section_value, 'Price'].fillna(df.loc[df['Section']==section_value, 'Price'].mean())

希望它有所帮助！和平

根据另一列的平均值填充列的值

3 个答案:

`pandas`手术但速度较慢

请参阅@ DSM的答案以获得更快的`pandas`解决方案

一起

根据另一列的平均值填充列的值

3 个答案:

pandas手术但速度较慢

请参阅@ DSM的答案以获得更快的pandas解决方案

一起

`pandas`手术但速度较慢

请参阅@ DSM的答案以获得更快的`pandas`解决方案