我想连接该列中同一列的两个值,这是我的csv文件:
Date,Region,TemperatureMax,TemperatureMin,PrecipitationMax,PrecipitationMin
01/01/2016,Champagne Ardenne,12,6,2.5,0.3
02/01/2016,Champagne Ardenne,13,9,3.9,0.6
03/01/2016,Champagne Ardenne,14,7,22.5,12.5
01/01/2016,Bourgogne,9,5,0.1,0
02/01/2016,Bourgogne,11,8,16.3,4.2
03/01/2016,Bourgogne,10,5,12.2,6.3
01/01/2016,Pays de la Loire,12,6,2.5,0.3
02/01/2016,Pays de la Loire,13,9,3.9,0.6
03/01/2016,Pays de la Loire,14,7,22.5,12.5
我希望 Bourgogne Champagne Ardenne 而不是将它们分开并计算 TemperatureMax , TemperatureMin , PrecipitationMax <的平均值/ strong>, PrecipitationMin :
01/01/2016,Bourgogne Champagne Ardenne,10.5,5.5,1.3,0.15
02/01/2016,Bourgogne Champagne Ardenne,12,8.5,10.1,2.4
03/01/2016,Bourgogne Champagne Ardenne,12,6,17.35,9.4
01/01/2016,Pays de la Loire,12,6,2.5,0.3
02/01/2016,Pays de la Loire,13,9,3.9,0.6
03/01/2016,Pays de la Loire,14,7,22.5,12.5
答案 0 :(得分:1)
使用groupby的agg
方法:
df.groupby('Date').agg({
'Region': lambda g: g.sort_values().str.cat(sep=' '),
'TemperatureMax': 'mean',
'TemperatureMin': 'mean',
'PrecipitationMax': 'mean',
'PrecipitationMin': 'mean'
})
请注意,这会按字母顺序连接区域。
答案 1 :(得分:1)
更通用的解决方案首先是dict
d = {'Champagne Ardenne':'Bourgogne Champagne Ardenne',
'Bourgogne':'Bourgogne Champagne Ardenne'}
df['Region'] = df['Region'].replace(d)
df1 = df.groupby(['Date', 'Region'], as_index=False, sort=False).mean()
print (df1)
Date Region TemperatureMax TemperatureMin \
0 01/01/2016 Bourgogne Champagne Ardenne 10.5 5.5
1 02/01/2016 Bourgogne Champagne Ardenne 12.0 8.5
2 03/01/2016 Bourgogne Champagne Ardenne 12.0 6.0
3 01/01/2016 Pays de la Loire 12.0 6.0
4 02/01/2016 Pays de la Loire 13.0 9.0
5 03/01/2016 Pays de la Loire 14.0 7.0
PrecipitationMax PrecipitationMin
0 1.30 0.15
1 10.10 2.40
2 17.35 9.40
3 2.50 0.30
4 3.90 0.60
5 22.50 12.50
,然后是replace
+汇总groupby
:
var find = function(arr, name) {
for (var i = 0; i < arr.length; i++) {
for (var j = 0; j < arr[i].categories.length; j++) {
if (arr[i].categories[j].name === name) {
return arr[i].categories[j];
}
}
}
}
find(arr, 'Kids')