我有以下两个数据帧:
第一个数据帧包含一个总线时间表,其中包含总线编号,停止ID和停止名称。
1。 df_time:
bus_nr stop_id stop_name
0 1 1 a
1 1 2 b
2 1 3 c
3 1 4 d
4 2 1 k
5 2 2 l
6 2 3 m
7 2 4 n
8 2 5 o
第二个数据帧包含对总线所在位置的一些测量值,但缺少一些停靠点。该帧包含bus_nr,停止名称,行程ID和其他信息:
2。 df_measure:
bus_nr trip_id stop_name other
0 1 1 a x
1 1 1 b x
2 1 1 d x
3 1 2 c x
4 1 2 d x
5 2 3 k x
6 2 3 m x
7 2 3 n x
现在我想将时间表中的缺失值加到测量的停止位置,以便在测量中停止所有时间表:
bus_nr trip_id stop_id stop_name other
0 1 1 1 a x
1 1 1 2 b x
2 1 1 3 c NaN
3 1 1 4 d x
4 1 2 1 a NaN
5 1 2 2 b NaN
6 1 2 3 c x
7 1 2 4 d x
8 2 3 1 k x
9 2 3 2 l NaN
10 2 3 3 m x
11 2 3 4 n x
12 2 3 5 o NaN
因此,对于每个bus_nr,我想使用 df_time 中的所有信息并将其插入 df_measure 。有什么想法吗?
创建数据帧的代码:
df_time = pd.DataFrame()
df_time['bus_nr'] = [1, 1, 1, 1, 2, 2, 2, 2, 2]
df_time['stop_id'] = [1, 2, 3, 4, 1, 2, 3, 4, 5]
df_time['stop_name'] = ['a', 'b', 'c', 'd', 'k', 'l', 'm', 'n', 'o']
df_measure = pd.DataFrame()
df_measure['bus_nr'] = [1, 1, 1, 1, 1, 2, 2, 2]
df_measure['trip_id'] = [1, 1, 1, 2, 2, 3, 3, 3]
df_measure['stop_name'] = ['a', 'b', 'd', 'c', 'd', 'k', 'm', 'n']
df_measure['other'] = ['x', 'x', 'x', 'x', 'x', 'x', 'x', 'x']
在Sagar Dawda的帮助下,我找到了一个有效的解决方案:
1.使用bus_nr和trip_nr的所有组合创建一个数据帧
df_combi = df_measure[['bus_nr', 'trip_id']].copy()
df_combi = df_combi.loc[df_combi.duplicated(['bus_nr', 'trip_id'], keep='first')==False]
2。使用Sagar Dawda的解决方案
out = pd.merge_ordered(df_time, df_measure, right_by='trip_id', how='outer')
out = out.loc[:, ['bus_nr', 'trip_id', 'stop_id', 'stop_name', 'other']]
3。合并
out.merge(df_combi)
答案 0 :(得分:1)
out = pd.merge_ordered(df_time, df_measure, right_by='trip_id', how='outer')
out = out.loc[:, ['bus_nr', 'trip_id', 'stop_id', 'stop_name', 'other']]
out.sort_values(['bus_nr', 'trip_id'], inplace=True)
out
# I have shared the output as an HTML table. Please run the code snippet.
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>bus_nr</th>
<th>trip_id</th>
<th>stop_id</th>
<th>stop_name</th>
<th>other</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>1</td>
<td>1</td>
<td>1</td>
<td>a</td>
<td>x</td>
</tr>
<tr>
<th>1</th>
<td>1</td>
<td>1</td>
<td>2</td>
<td>b</td>
<td>x</td>
</tr>
<tr>
<th>2</th>
<td>1</td>
<td>1</td>
<td>3</td>
<td>c</td>
<td>NaN</td>
</tr>
<tr>
<th>3</th>
<td>1</td>
<td>1</td>
<td>4</td>
<td>d</td>
<td>x</td>
</tr>
<tr>
<th>9</th>
<td>1</td>
<td>2</td>
<td>1</td>
<td>a</td>
<td>NaN</td>
</tr>
<tr>
<th>10</th>
<td>1</td>
<td>2</td>
<td>2</td>
<td>b</td>
<td>NaN</td>
</tr>
<tr>
<th>11</th>
<td>1</td>
<td>2</td>
<td>3</td>
<td>c</td>
<td>x</td>
</tr>
<tr>
<th>12</th>
<td>1</td>
<td>2</td>
<td>4</td>
<td>d</td>
<td>x</td>
</tr>
<tr>
<th>18</th>
<td>1</td>
<td>3</td>
<td>1</td>
<td>a</td>
<td>NaN</td>
</tr>
<tr>
<th>19</th>
<td>1</td>
<td>3</td>
<td>2</td>
<td>b</td>
<td>NaN</td>
</tr>
<tr>
<th>20</th>
<td>1</td>
<td>3</td>
<td>3</td>
<td>c</td>
<td>NaN</td>
</tr>
<tr>
<th>21</th>
<td>1</td>
<td>3</td>
<td>4</td>
<td>d</td>
<td>NaN</td>
</tr>
<tr>
<th>4</th>
<td>2</td>
<td>1</td>
<td>1</td>
<td>k</td>
<td>NaN</td>
</tr>
<tr>
<th>5</th>
<td>2</td>
<td>1</td>
<td>2</td>
<td>l</td>
<td>NaN</td>
</tr>
<tr>
<th>6</th>
<td>2</td>
<td>1</td>
<td>3</td>
<td>m</td>
<td>NaN</td>
</tr>
<tr>
<th>7</th>
<td>2</td>
<td>1</td>
<td>4</td>
<td>n</td>
<td>NaN</td>
</tr>
<tr>
<th>8</th>
<td>2</td>
<td>1</td>
<td>5</td>
<td>o</td>
<td>NaN</td>
</tr>
<tr>
<th>13</th>
<td>2</td>
<td>2</td>
<td>1</td>
<td>k</td>
<td>NaN</td>
</tr>
<tr>
<th>14</th>
<td>2</td>
<td>2</td>
<td>2</td>
<td>l</td>
<td>NaN</td>
</tr>
<tr>
<th>15</th>
<td>2</td>
<td>2</td>
<td>3</td>
<td>m</td>
<td>NaN</td>
</tr>
<tr>
<th>16</th>
<td>2</td>
<td>2</td>
<td>4</td>
<td>n</td>
<td>NaN</td>
</tr>
<tr>
<th>17</th>
<td>2</td>
<td>2</td>
<td>5</td>
<td>o</td>
<td>NaN</td>
</tr>
<tr>
<th>22</th>
<td>2</td>
<td>3</td>
<td>1</td>
<td>k</td>
<td>x</td>
</tr>
<tr>
<th>23</th>
<td>2</td>
<td>3</td>
<td>2</td>
<td>l</td>
<td>NaN</td>
</tr>
<tr>
<th>24</th>
<td>2</td>
<td>3</td>
<td>3</td>
<td>m</td>
<td>x</td>
</tr>
<tr>
<th>25</th>
<td>2</td>
<td>3</td>
<td>4</td>
<td>n</td>
<td>x</td>
</tr>
<tr>
<th>26</th>
<td>2</td>
<td>3</td>
<td>5</td>
<td>o</td>
<td>NaN</td>
</tr>
</tbody>
</table>
&#13;
希望这有帮助
答案 1 :(得分:0)
假设bus_nr和stop_name唯一标识行,您只需合并这些列:
df_measure = pd.merge([df_time, df_measure], on=['bus_nr', 'stop_name'])