Pandas fillna()基于特定的列属性

时间:2016-08-31 06:50:38

标签: python pandas indexing nan mean

假设我有这张桌子

Type | Killed | Survived
Dog      5         2
Dog      3         4
Cat      1         7
Dog     nan        3
cow     nan        2

Killed缺少[Type] = Dog的其中一个值。

我希望[Killed] [Type] = Dog中的平均值归于df[df['Type'] == 'Dog'].mean().round()

我的代码如下:

  1. 搜索平均值
  2. df.loc[(df['Type'] == 'Dog') & (df['Killed'])].fillna(2.25, inplace = True)

    这将给我平均值(约2.25)

    1. 判断均值(这是问题开始的地方)
    2. [Killed]

      代码运行,但值不是估算值,NaN值仍然存在。

      我的问题是,如何根据[Type] = DogopenTimePickerDialog(false); private void openTimePickerDialog(boolean is24r){ Calendar calendar = Calendar.getInstance(); timePickerDialog = new TimePickerDialog( AndroidTimeActivity.this, onTimeSetListener, calendar.get(Calendar.HOUR_OF_DAY), calendar.get(Calendar.MINUTE), is24r); timePickerDialog.setTitle("Set Alarm Time"); timePickerDialog.show(); } OnTimeSetListener onTimeSetListener = new OnTimeSetListener(){ @Override public void onTimeSet(TimePicker view, int hourOfDay, int minute) { Calendar calNow = Calendar.getInstance(); Calendar calSet = (Calendar) calNow.clone(); calSet.set(Calendar.HOUR_OF_DAY, hourOfDay); calSet.set(Calendar.MINUTE, minute); calSet.set(Calendar.SECOND, 0); calSet.set(Calendar.MILLISECOND, 0); if(calSet.compareTo(calNow) <= 0){ //Today Set time passed, count to tomorrow calSet.add(Calendar.DATE, 1); } setAlarm(calSet); }}; private void setAlarm(Calendar targetCal){ textAlarmPrompt.setText( "\n\n***\n" + "Alarm is set@ " + targetCal.getTime() + "\n" + "***\n"); Intent intent = new Intent(getBaseContext(), AlarmReceiver.class); PendingIntent pendingIntent = PendingIntent.getBroadcast(getBaseContext(), RQS_1, intent, 0); AlarmManager alarmManager = (AlarmManager)getSystemService(Context.ALARM_SERVICE); alarmManager.set(AlarmManager.RTC_WAKEUP, targetCal.getTimeInMillis(), pendingIntent); } 中归咎于平均值。

3 个答案:

答案 0 :(得分:3)

对我来说工作:

df.ix[df['Type'] == 'Dog', 'Killed'] = df.ix[df['Type'] == 'Dog', 'Killed'].fillna(2.25)
print (df)
  Type  Killed  Survived
0  Dog    5.00         2
1  Dog    3.00         4
2  Cat    1.00         7
3  Dog    2.25         3
4  cow     NaN         2

Series需要fillna - 因为有两列KilledSurvived

m = df[df['Type'] == 'Dog'].mean().round()
print (m)
Killed      4.0
Survived    3.0
dtype: float64

df.ix[df['Type'] == 'Dog'] = df.ix[df['Type'] == 'Dog'].fillna(m)
print (df)
  Type  Killed  Survived
0  Dog     5.0         2
1  Dog     3.0         4
2  Cat     1.0         7
3  Dog     4.0         3
4  cow     NaN         2

如果仅在列Killed中需要fillna:

#if dont need rounding, omit it
m = round(df.ix[df['Type'] == 'Dog', 'Killed'].mean())
print (m)
4

df.ix[df['Type'] == 'Dog', 'Killed'] = df.ix[df['Type'] == 'Dog', 'Killed'].fillna(m)
print (df)
  Type  Killed  Survived
0  Dog     5.0         2
1  Dog     3.0         8
2  Cat     1.0         7
3  Dog     4.0         3
4  cow     NaN         2

您可以重复使用以下代码:

filtered = df.ix[df['Type'] == 'Dog', 'Killed']
print (filtered)
0    5.0
1    3.0
3    NaN
Name: Killed, dtype: float64

df.ix[df['Type'] == 'Dog', 'Killed'] = filtered.fillna(filtered.mean())
print (df)
  Type  Killed  Survived
0  Dog     5.0         2
1  Dog     3.0         8
2  Cat     1.0         7
3  Dog     4.0         3
4  cow     NaN         2

答案 1 :(得分:3)

带有groupby

transform

df.groupby('Type').Killed.transform(lambda x: x.fillna(x.mean()))

设置

df = pd.DataFrame([
        ['Dog', 5, 2],
        ['Dog', 3, 4],
        ['Cat', 1, 7],
        ['Dog', np.nan, 3],
        ['Cow', np.nan, 2]
    ], columns=['Type', 'Killed', 'Survived'])

df.Killed = df.groupby('Type').Killed.transform(lambda x: x.fillna(x.mean()))
df

enter image description here

如果您打算在计算平均值时考虑np.nan

df.Killed = df.groupby('Type').Killed.transform(lambda x: x.fillna(x.fillna(0).mean()))
df

enter image description here

答案 2 :(得分:1)

两个问题:请注意df.loc[(df['Type'] == 'Dog') & (df['Killed'])]没有做你认为它正在做的事情(我认为)。你没有选择类型为dog的行和“Killed”列,而是选择dog类型的行,然后使用“Killed”列进行元素“和”,这将为您提供垃圾 - False列'已杀'是nan

请参阅:

In [6]: df.loc[(df['Type'] == 'Dog') & (df['Killed'])]
Out[6]: 
  Type  Killed  Survived
0  Dog     5.0         2
1  Dog     3.0         4

您想要的是以下内容:

In [5]: df.loc[(df['Type'] == 'Dog'), ['Killed']]
Out[5]: 
   Killed
0     5.0
1     3.0
3     NaN

还有一个问题是您需要将作业与.loc结合使用。和.fillna,如下所示:

In [6]: df.loc[(df['Type'] == 'Dog'), ['Killed']] = df.loc[(df['Type'] == 'Dog'), ['Killed']].fillna(2.25)

In [7]: df
Out[7]: 
  Type  Killed  Survived
0  Dog    5.00         2
1  Dog    3.00         4
2  Cat    1.00         7
3  Dog    2.25         3
4  cow     NaN         2

注意

您为平均值提供的值是错误的,或者与您在答案中提供的数据不对应。平均值应为4.