如何计算熊猫数据框中的行平均值?

时间:2016-09-12 08:48:26

标签: python-3.x pandas dataframe mean

我有pandas df列,T max& T min。我想在下一栏中计算T mean。我做了这个df['T mean']= df[['T max','T min']].mean(axis=1),但没有成功。我得T maxT mean。有人能帮助我吗?

1 个答案:

答案 0 :(得分:1)

我认为列T min存在问题 - type值为string,而不是数字。所以你需要按astype

进行投射

样品:

df=pd.DataFrame({'T max':[1,2,3],'T min':['5','6','7']})
print (df)
   T max T min
0      1     5
1      2     6
2      3     7

print (type(df.ix[0,'T min']))
<class 'str'>

df['T mean']= df[['T max','T min']].mean(axis=1) 
print (df)
   T max T min  T mean
0      1     5     1.0
1      2     6     2.0
2      3     7     3.0

#cast column to int
df['T min'] = df['T min'].astype(int)

print (type(df.ix[0,'T min']))
<class 'numpy.int32'>

df['T mean new']= df[['T max','T min']].mean(axis=1) 
print (df)
   T max  T min  T mean  T mean new
0      1      5     1.0         3.0
1      2      6     2.0         4.0
2      3      7     3.0         5.0

如果astype返回错误:

  

ValueError:基数为10的int()的无效文字:'aaa'

它意味着在列T min中至少有一个无效值。

样品:

df=pd.DataFrame({'T max':[1,2,3],'T min':['5','6','aaa']})
print (df)
   T max T min
0      1     5
1      2     6
2      3   aaa

df['T mean']= df[['T max','T min']].mean(axis=1) 
print (df)
   T max T min  T mean
0      1     5     1.0
1      2     6     2.0
2      3   aaa     3.0

#check invalid rows where is bad value in T min
print (df[ pd.to_numeric(df['T min'], errors='coerce').isnull()])
   T max T min  T mean
2      3   aaa     3.0

#replace invlid value to NaN
df['T min'] = pd.to_numeric(df['T min'], errors='coerce')

df['T mean new']= df[['T max','T min']].mean(axis=1) 
print (df)
   T max  T min  T mean  T mean new
0      1    5.0     1.0         3.0
1      2    6.0     2.0         4.0
2      3    NaN     3.0         3.0