我有pandas df列,T max
& T min
。我想在下一栏中计算T mean
。我做了这个df['T mean']= df[['T max','T min']].mean(axis=1)
,但没有成功。我得T max
为T mean
。有人能帮助我吗?
答案 0 :(得分:1)
我认为列T min
存在问题 - type
值为string
,而不是数字。所以你需要按astype
:
样品:
df=pd.DataFrame({'T max':[1,2,3],'T min':['5','6','7']})
print (df)
T max T min
0 1 5
1 2 6
2 3 7
print (type(df.ix[0,'T min']))
<class 'str'>
df['T mean']= df[['T max','T min']].mean(axis=1)
print (df)
T max T min T mean
0 1 5 1.0
1 2 6 2.0
2 3 7 3.0
#cast column to int
df['T min'] = df['T min'].astype(int)
print (type(df.ix[0,'T min']))
<class 'numpy.int32'>
df['T mean new']= df[['T max','T min']].mean(axis=1)
print (df)
T max T min T mean T mean new
0 1 5 1.0 3.0
1 2 6 2.0 4.0
2 3 7 3.0 5.0
如果astype
返回错误:
ValueError:基数为10的int()的无效文字:'aaa'
它意味着在列T min
中至少有一个无效值。
样品:
df=pd.DataFrame({'T max':[1,2,3],'T min':['5','6','aaa']})
print (df)
T max T min
0 1 5
1 2 6
2 3 aaa
df['T mean']= df[['T max','T min']].mean(axis=1)
print (df)
T max T min T mean
0 1 5 1.0
1 2 6 2.0
2 3 aaa 3.0
#check invalid rows where is bad value in T min
print (df[ pd.to_numeric(df['T min'], errors='coerce').isnull()])
T max T min T mean
2 3 aaa 3.0
#replace invlid value to NaN
df['T min'] = pd.to_numeric(df['T min'], errors='coerce')
df['T mean new']= df[['T max','T min']].mean(axis=1)
print (df)
T max T min T mean T mean new
0 1 5.0 1.0 3.0
1 2 6.0 2.0 4.0
2 3 NaN 3.0 3.0