我有两个数据框:
Df1,看起来像这样:
Pollutants.head()
Sensor_ID Sensor_Street_Name Sensor_Lat Sensor_Long Sensor_Type UOM Time_Instant
0 20020 Milano -via Carlo Pascal 45.478452 9.235016 Ammonia µg/m YYYY/MM/DD
1 17127 Milano - viale Marche 45.496067 9.193023 Benzene µg/m YYYY/MM/DD HH24:MI
2 17126 Milano -via Carlo Pascal 45.478452 9.235016 Benzene µg/m YYYY/MM/DD HH24:MI
3 6057 Milano - via Senato 45.470780 9.197180 Benzene µg/m YYYY/MM/DD HH24:MI
4 6062 Milano - P.zza Zavattari 45.476089 9.143509 Benzene µg/m YYYY/MM/DD HH24:MI
我有这个DF2:
Mi_Pollution.head()
Sensor_ID Time_Instant Measurement
0 14121 2013-11-01 00:00:00 0.8
1 14121 2013-11-01 03:00:00 0.6
2 14121 2013-11-01 06:00:00 0.4
3 14121 2013-11-01 09:00:00 0.4
4 14121 2013-11-01 12:00:00 0
我想做的是,获取 Sensor_Type 列并将其添加到基于 Sensor_ID < / strong>列,就像所需的输出是这样:
Sensor_ID Time_Instant Measurement Pollutants
0 20020 2015-11-01 00:00:00 0.3 Ammonia
1 20020 2015-11-01 03:00:00 0.5 Ammonia
2 20020 2015-11-01 06:00:00 2.3 Ammonia
3 20020 2013-11-01 09:00:00 0.4 Ammonia
4 20020 2013-11-01 12:00:00 0 Ammonia
有什么建议吗? ,谢谢U。
答案 0 :(得分:1)
您可以像这样使用merge :(我已经修改了示例)
data1 = """
Sensor_ID Sensor_Street_Name Sensor_Lat Sensor_Long Sensor_Type UOM Time_Instant
14121 Milano-viaCarloPascal 45.478452 9.235016 Ammonia µg/m YYYY/MM/DD
17127 Milano-vialeMarche 45.496067 9.193023 Benzene µg/m YYYY/MM/DD_HH24:MI
17126 Milano-viaCarloPascal 45.478452 9.235016 Benzene µg/m YYYY/MM/DD_HH24:MI
6057 Milano-viaSenato 45.470780 9.197180 Benzene µg/m YYYY/MM/DD_HH24:MI
6062 Milano-P.zzaZavattari 45.476089 9.143509 Benzene µg/m YYYY/MM/DD_HH24:MI
"""
data2 = """
Sensor_ID,Time_Instant,Measurement
14121,2013-11-01 00:00:00,0.8
14121,2013-11-01 03:00:00,0.6
14121,2013-11-01 06:00:00,0.4
14121,2013-11-01 09:00:00,0.4
17127,2013-11-01 12:00:00,0
"""
import pandas as pd
df1 = pd.read_csv(pd.compat.StringIO(data1), sep='\s+')
df2 = pd.read_csv(pd.compat.StringIO(data2), sep=',')
s1 = pd.merge(df2, df1, how='left', on=['Sensor_ID'])
然后,从数据帧s1中删除未使用的列,并将Sensor_Type列重命名为Pollutants,将Time_Instant_x重命名为Time_Instant
cols_to_delete = ['Sensor_Street_Name', 'Sensor_Lat','Sensor_Long','Time_Instant_y', 'UOM']
s1.drop(cols_to_delete, axis=1, inplace=True)
s1.rename(columns={'Time_Instant_x': 'Time_Instant', 'Sensor_Type': 'Pollutants'}, inplace=True)
此示例结果为:
Sensor_ID Time_Instant Measurement Pollutants
0 14121 2013-11-01 00:00:00 0.8 Ammonia
1 14121 2013-11-01 03:00:00 0.6 Ammonia
2 14121 2013-11-01 06:00:00 0.4 Ammonia
3 14121 2013-11-01 09:00:00 0.4 Ammonia
4 17127 2013-11-01 12:00:00 0.0 Benzene