我有两个数据框:
DF1(我刚刚重新采样了):
Mi_pollution.head():
Sensor_ID Time_Instant Measurement
0 10273 2013-11-01 00:00:00 46
1 10273 2013-11-01 01:00:00 51
2 10273 2013-11-01 02:00:00 39
3 10273 2013-11-01 03:00:00 30
4 10273 2013-11-01 04:00:00 37
我有DF2:
Pollutants.head():
Sensor_ID Sensor_Street_Name Sensor_Lat Sensor_Long Sensor_Type UOM Time_Instant
0 20020 Milano -via Carlo Pascal 45.478452 9.235016 Ammonia µg/m YYYY/MM/DD
1 17127 Milano - viale Marche 45.496067 9.193023 Benzene µg/m YYYY/MM/DD HH24:MI
2 17126 Milano -via Carlo Pascal 45.478452 9.235016 Benzene µg/m YYYY/MM/DD HH24:MI
3 6057 Milano - via Senato 45.470780 9.197180 Benzene µg/m YYYY/MM/DD HH24:MI
4 6062 Milano - P.zza Zavattari 45.476089 9.143509 Benzene µg/m YYYY/MM/DD HH24:MI
我要做的是基于污染物创建新列,并将其添加到DF1中,并根据Sensor分配每个测量值,例如:
Sensor_ID Time_Instant Ammonia Benzene Nitrogene …...
0 20020 2013-12-01 00:00:00 4.8 Nan Nan
1 20020 2013-12-01 01:00:00 5.3 Nan Nan
2 20020 2013-12-01 02:00:00 3.0 Nan Nan
.
.
56 14330 2013-11-01 00:00:00 Nan 6.3 Nan
57 14330 2013-11-01 01:00:00 Nan 5.3 Nan
.
.
任何建议将不胜感激,谢谢大家。
答案 0 :(得分:0)
假设您要加入Sensor_ID
(在您给出的小例子中,两个数据框之间没有共同的Sensor_IDs
),则可以合并Sensor_ID
上的df(可能还有Time_Instant
?)。
然后,您可以使用pivot_table
将行值(Sensor_Type
)转置为列标题,然后用Measurement
填充行值。
例如:
df3 = df1.merge(df2, on='Sensor_ID', how='left')\
.pivot_table(index=['Sensor_ID','Sensor_Street_Name','Other columns'],
values='Measurement',
columns='Sensor_Type')\
.reset_index()