当它们与特定列的值匹配时,将多余的行从一个数据框添加到另一个数据框

时间:2020-08-22 15:31:36

标签: python-3.x pandas dataframe row

我有a first dataframe,必须在其中添加a second dataframe的行。

这个或多或少像第一个:

    QID    Questions    B   Answer1 Answer2 Answer3 F G H I J
0   3   a   4.0 a   a   a   a   e   g   i   l    
1   4   b   5.0 b   b   b   a   r   h   m   p
2   5   d   5.0 NaN e   d   b   u   e   i   z
3   6   e   5.0 d   h   r   b   c   z   i   3
...

第二个:

    QID    Questions    B   Answer1 Answer2 Answer3 F ...
0   1   a   4.0 a   a   a   a   
1   2   b   5.0 b   k   b   a   
2   2_1 z   5.0 b   k   b   a   
3   2_2 w   4.0 b   k   b   c   
4   3   d   5.0 NaN e   d   b   
5   4   e   5.0 d   h   r   b   
...

我想要得到:

    QID    Questions    B   Answer1 Answer2 Answer3 F G H I J
0   3   a   4.0 a   a   a   a   e   g   i   l    
1   4   b   5.0 b   b   b   a   r   h   m   p
2   4_1 z   5.0 b   k   b   a   r   h   m   p
3   4_2 w   4.0 b   k   b   c   r   h   m   p
4   5   d   5.0 NaN e   d   b   u   e   i   z
5   6   e   5.0 d   h   r   b   c   z   i   3
...

如您所见,数据框共享问题b,因此我在新数据框中添加了以下行,其中包括_

从字面上看,这意味着第一数据表和第二数据表在“答案”列的单元格中共享相同的“ t1”和“ t2”文本。但是对于给定组合(t1,t2),其中t1 == t2,当它下面还有行时,QID具有_,那么我想在输入行之后添加这些行内。

我已经开始:

rows_to_add = pd.DataFrame()
for i, row1 in df.iterrows():
  for j, row2 in df2.iterrows():
    if row1['Questions'] == row2['Questions']:
      # here I want to test if the next row has _ in his QID
      # if so I add all the lines with the same QID before _ but with row1 QID
      k = 0
      for _, next_row_df2 in df2[j+1:].iterrows():
        if "_" in str(next_row_df2['QID']):
          next_row_df2['QID'] = str(row1['QID']) + '_' + str(k) # but I need to change the QID to get it right when inserting rows in df2
          rows_to_add += next_row_df2 
        else:
          break # exit this loop and add the lines to the dataframe
        k += 1
      df = pd.concat([df.iloc[:i], rows_to_add, df.iloc[i:]]).reset_index(drop=True)
      rows_to_add = pd.DataFrame()

但是A.没有添​​加行,B。效率不高(有人说it's even horrible)。也许我可以用一种更有效的方式做到这一点:仅在存在_的df2行上进行迭代?还是使用map-reduce?

我正在尝试:

for i, row1 in df.iterrows():
  for j, row2 in df2.iterrows():
    if row1['Questions'] == row2['Questions']:
      # here I want to test if the next row has _ in his QID
      # if so I add all the lines with the same QID before _ but with row1 QID
      k = 0
      L = []
      for _, next_row_df2 in df2[j+1:].iterrows():
        if "_" in str(next_row_df2['QID']):
          next_row_df2['QID'] = str(row1['QID']) + '_' + str(k) # but I need to change the QID to get it right when inserting rows in df2
          L.append(next_row_df2)
        else:
          break # exit this loop and add the lines to the dataframe
        k += 1
      if L:
        rows_to_add = pd.concat(L, ignore_index=True)
      df = pd.concat([df.iloc[:i], rows_to_add, df.iloc[i:]]).reset_index(drop=True)
      rows_to_add = pd.DataFrame()

但是还没有结果。

0 个答案:

没有答案