如何从一个DataFrame中获取一列并根据索引将其添加到另一Dataframe中?

时间:2019-03-20 16:01:53

标签: python-3.x pandas dataframe

我有两个数据框:

Df1,看起来像这样:

Pollutants.head()

     Sensor_ID     Sensor_Street_Name     Sensor_Lat     Sensor_Long    Sensor_Type   UOM      Time_Instant

0     20020   Milano -via Carlo Pascal    45.478452      9.235016       Ammonia      µg/m      YYYY/MM/DD

1     17127   Milano - viale Marche       45.496067      9.193023       Benzene      µg/m      YYYY/MM/DD HH24:MI

2     17126   Milano -via Carlo Pascal    45.478452      9.235016       Benzene      µg/m      YYYY/MM/DD HH24:MI

3     6057    Milano - via Senato         45.470780      9.197180       Benzene      µg/m      YYYY/MM/DD HH24:MI

4     6062    Milano - P.zza Zavattari    45.476089      9.143509       Benzene      µg/m      YYYY/MM/DD HH24:MI 

我有这个DF2:

      Mi_Pollution.head()

      Sensor_ID      Time_Instant           Measurement
0     14121      2013-11-01 00:00:00        0.8
1     14121      2013-11-01 03:00:00        0.6
2     14121      2013-11-01 06:00:00        0.4
3     14121      2013-11-01 09:00:00        0.4
4     14121      2013-11-01 12:00:00         0

我想做的是,获取 Sensor_Type 列并将其添加到基于 Sensor_ID < / strong>列,就像所需的输出是这样:

  Sensor_ID     Time_Instant        Measurement   Pollutants
0   20020    2015-11-01 00:00:00         0.3        Ammonia    
1   20020    2015-11-01 03:00:00         0.5        Ammonia
2   20020    2015-11-01 06:00:00         2.3        Ammonia
3   20020    2013-11-01 09:00:00         0.4        Ammonia
4   20020    2013-11-01 12:00:00          0         Ammonia

有什么建议吗? ,谢谢U。

1 个答案:

答案 0 :(得分:1)

您可以像这样使用merge :(我已经修改了示例)

data1 = """
Sensor_ID     Sensor_Street_Name     Sensor_Lat     Sensor_Long    Sensor_Type   UOM      Time_Instant
14121   Milano-viaCarloPascal    45.478452      9.235016       Ammonia      µg/m      YYYY/MM/DD
17127   Milano-vialeMarche       45.496067      9.193023       Benzene      µg/m      YYYY/MM/DD_HH24:MI
17126   Milano-viaCarloPascal    45.478452      9.235016       Benzene      µg/m      YYYY/MM/DD_HH24:MI
6057    Milano-viaSenato         45.470780      9.197180       Benzene      µg/m      YYYY/MM/DD_HH24:MI
6062    Milano-P.zzaZavattari    45.476089      9.143509       Benzene      µg/m      YYYY/MM/DD_HH24:MI
   """
data2 = """
Sensor_ID,Time_Instant,Measurement
14121,2013-11-01 00:00:00,0.8        
14121,2013-11-01 03:00:00,0.6        
14121,2013-11-01 06:00:00,0.4        
14121,2013-11-01 09:00:00,0.4        
17127,2013-11-01 12:00:00,0
                """

import pandas as pd
df1 = pd.read_csv(pd.compat.StringIO(data1), sep='\s+')
df2 = pd.read_csv(pd.compat.StringIO(data2), sep=',')

s1 = pd.merge(df2, df1, how='left', on=['Sensor_ID'])

然后,从数据帧s1中删除未使用的列,并将Sensor_Type列重命名为Pollutants,将Time_Instant_x重命名为Time_Instant

cols_to_delete = ['Sensor_Street_Name', 'Sensor_Lat','Sensor_Long','Time_Instant_y', 'UOM']
s1.drop(cols_to_delete, axis=1, inplace=True)
s1.rename(columns={'Time_Instant_x': 'Time_Instant', 'Sensor_Type': 'Pollutants'}, inplace=True)

此示例结果为:

       Sensor_ID         Time_Instant  Measurement Pollutants
0      14121  2013-11-01 00:00:00          0.8    Ammonia
1      14121  2013-11-01 03:00:00          0.6    Ammonia
2      14121  2013-11-01 06:00:00          0.4    Ammonia
3      14121  2013-11-01 09:00:00          0.4    Ammonia
4      17127  2013-11-01 12:00:00          0.0    Benzene