根据列值合并一个Dataframe而不从左侧删除任何列

时间:2018-05-16 10:44:22

标签: python pandas join merge insert

我有以下两个数据帧:
第一个数据帧包含一个总线时间表,其中包含总线编号,停止ID和停止名称。

1。 df_time:

     bus_nr   stop_id   stop_name
0      1         1          a
1      1         2          b
2      1         3          c
3      1         4          d
4      2         1          k
5      2         2          l
6      2         3          m
7      2         4          n
8      2         5          o

第二个数据帧包含对总线所在位置的一些测量值,但缺少一些停靠点。该帧包含bus_nr,停止名称,行程ID和其他信息:

2。 df_measure:

     bus_nr   trip_id   stop_name   other
0      1         1          a         x
1      1         1          b         x
2      1         1          d         x
3      1         2          c         x
4      1         2          d         x
5      2         3          k         x
6      2         3          m         x
7      2         3          n         x

现在我想将时间表中的缺失值加到测量的停止位置,以便在测量中停止所有时间表:

     bus_nr   trip_id   stop_id   stop_name   other
0      1         1         1          a         x
1      1         1         2          b         x
2      1         1         3          c         NaN
3      1         1         4          d         x
4      1         2         1          a         NaN
5      1         2         2          b         NaN
6      1         2         3          c         x
7      1         2         4          d         x
8      2         3         1          k         x
9      2         3         2          l         NaN
10     2         3         3          m         x
11     2         3         4          n         x
12     2         3         5          o         NaN

因此,对于每个bus_nr,我想使用 df_time 中的所有信息并将其插入 df_measure 。有什么想法吗?

创建数据帧的代码:

df_time = pd.DataFrame()
df_time['bus_nr'] = [1, 1, 1, 1, 2, 2, 2, 2, 2]
df_time['stop_id'] = [1, 2, 3, 4, 1, 2, 3, 4, 5]
df_time['stop_name'] = ['a', 'b', 'c', 'd', 'k', 'l', 'm', 'n', 'o']

df_measure = pd.DataFrame()
df_measure['bus_nr'] = [1, 1, 1, 1, 1, 2, 2, 2]
df_measure['trip_id'] = [1, 1, 1, 2, 2, 3, 3, 3]
df_measure['stop_name'] = ['a', 'b', 'd', 'c', 'd', 'k', 'm', 'n']
df_measure['other'] = ['x', 'x', 'x', 'x', 'x', 'x', 'x', 'x']

解决方案:

在Sagar Dawda的帮助下,我找到了一个有效的解决方案:
1.使用bus_nr和trip_nr的所有组合创建一个数据帧

df_combi = df_measure[['bus_nr', 'trip_id']].copy()
df_combi = df_combi.loc[df_combi.duplicated(['bus_nr', 'trip_id'], keep='first')==False]

2。使用Sagar Dawda的解决方案

out = pd.merge_ordered(df_time, df_measure, right_by='trip_id', how='outer')
out = out.loc[:, ['bus_nr', 'trip_id', 'stop_id', 'stop_name', 'other']]

3。合并

out.merge(df_combi)

2 个答案:

答案 0 :(得分:1)

out = pd.merge_ordered(df_time, df_measure, right_by='trip_id', how='outer')
out = out.loc[:, ['bus_nr', 'trip_id', 'stop_id', 'stop_name', 'other']]
out.sort_values(['bus_nr', 'trip_id'], inplace=True)

out
# I have shared the output as an HTML table. Please run the code snippet.



<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>bus_nr</th>
      <th>trip_id</th>
      <th>stop_id</th>
      <th>stop_name</th>
      <th>other</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>1</td>
      <td>1</td>
      <td>1</td>
      <td>a</td>
      <td>x</td>
    </tr>
    <tr>
      <th>1</th>
      <td>1</td>
      <td>1</td>
      <td>2</td>
      <td>b</td>
      <td>x</td>
    </tr>
    <tr>
      <th>2</th>
      <td>1</td>
      <td>1</td>
      <td>3</td>
      <td>c</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>3</th>
      <td>1</td>
      <td>1</td>
      <td>4</td>
      <td>d</td>
      <td>x</td>
    </tr>
    <tr>
      <th>9</th>
      <td>1</td>
      <td>2</td>
      <td>1</td>
      <td>a</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>10</th>
      <td>1</td>
      <td>2</td>
      <td>2</td>
      <td>b</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>11</th>
      <td>1</td>
      <td>2</td>
      <td>3</td>
      <td>c</td>
      <td>x</td>
    </tr>
    <tr>
      <th>12</th>
      <td>1</td>
      <td>2</td>
      <td>4</td>
      <td>d</td>
      <td>x</td>
    </tr>
    <tr>
      <th>18</th>
      <td>1</td>
      <td>3</td>
      <td>1</td>
      <td>a</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>19</th>
      <td>1</td>
      <td>3</td>
      <td>2</td>
      <td>b</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>20</th>
      <td>1</td>
      <td>3</td>
      <td>3</td>
      <td>c</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>21</th>
      <td>1</td>
      <td>3</td>
      <td>4</td>
      <td>d</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>4</th>
      <td>2</td>
      <td>1</td>
      <td>1</td>
      <td>k</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>5</th>
      <td>2</td>
      <td>1</td>
      <td>2</td>
      <td>l</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>6</th>
      <td>2</td>
      <td>1</td>
      <td>3</td>
      <td>m</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>7</th>
      <td>2</td>
      <td>1</td>
      <td>4</td>
      <td>n</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>8</th>
      <td>2</td>
      <td>1</td>
      <td>5</td>
      <td>o</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>13</th>
      <td>2</td>
      <td>2</td>
      <td>1</td>
      <td>k</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>14</th>
      <td>2</td>
      <td>2</td>
      <td>2</td>
      <td>l</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>15</th>
      <td>2</td>
      <td>2</td>
      <td>3</td>
      <td>m</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>16</th>
      <td>2</td>
      <td>2</td>
      <td>4</td>
      <td>n</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>17</th>
      <td>2</td>
      <td>2</td>
      <td>5</td>
      <td>o</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>22</th>
      <td>2</td>
      <td>3</td>
      <td>1</td>
      <td>k</td>
      <td>x</td>
    </tr>
    <tr>
      <th>23</th>
      <td>2</td>
      <td>3</td>
      <td>2</td>
      <td>l</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>24</th>
      <td>2</td>
      <td>3</td>
      <td>3</td>
      <td>m</td>
      <td>x</td>
    </tr>
    <tr>
      <th>25</th>
      <td>2</td>
      <td>3</td>
      <td>4</td>
      <td>n</td>
      <td>x</td>
    </tr>
    <tr>
      <th>26</th>
      <td>2</td>
      <td>3</td>
      <td>5</td>
      <td>o</td>
      <td>NaN</td>
    </tr>
  </tbody>
</table>
&#13;
&#13;
&#13;

希望这有帮助

答案 1 :(得分:0)

假设bus_nr和stop_name唯一标识行,您只需合并这些列:

df_measure = pd.merge([df_time, df_measure], on=['bus_nr', 'stop_name'])