Question

我有a first dataframe，必须在其中添加a second dataframe的行。

这个或多或少像第一个：

    QID    Questions    B   Answer1 Answer2 Answer3 F G H I J
0   3   a   4.0 a   a   a   a   e   g   i   l    
1   4   b   5.0 b   b   b   a   r   h   m   p
2   5   d   5.0 NaN e   d   b   u   e   i   z
3   6   e   5.0 d   h   r   b   c   z   i   3
...

第二个：

    QID    Questions    B   Answer1 Answer2 Answer3 F ...
0   1   a   4.0 a   a   a   a   
1   2   b   5.0 b   k   b   a   
2   2_1 z   5.0 b   k   b   a   
3   2_2 w   4.0 b   k   b   c   
4   3   d   5.0 NaN e   d   b   
5   4   e   5.0 d   h   r   b   
...

我想要得到：

    QID    Questions    B   Answer1 Answer2 Answer3 F G H I J
0   3   a   4.0 a   a   a   a   e   g   i   l    
1   4   b   5.0 b   b   b   a   r   h   m   p
2   4_1 z   5.0 b   k   b   a   r   h   m   p
3   4_2 w   4.0 b   k   b   c   r   h   m   p
4   5   d   5.0 NaN e   d   b   u   e   i   z
5   6   e   5.0 d   h   r   b   c   z   i   3
...

如您所见，数据框共享问题b，因此我在新数据框中添加了以下行，其中包括_。

从字面上看，这意味着第一数据表和第二数据表在“答案”列的单元格中共享相同的“ t1”和“ t2”文本。但是对于给定组合（t1，t2），其中t1 == t2，当它下面还有行时，QID具有_，那么我想在输入行之后添加这些行内。

我已经开始：

rows_to_add = pd.DataFrame()
for i, row1 in df.iterrows():
  for j, row2 in df2.iterrows():
    if row1['Questions'] == row2['Questions']:
      # here I want to test if the next row has _ in his QID
      # if so I add all the lines with the same QID before _ but with row1 QID
      k = 0
      for _, next_row_df2 in df2[j+1:].iterrows():
        if "_" in str(next_row_df2['QID']):
          next_row_df2['QID'] = str(row1['QID']) + '_' + str(k) # but I need to change the QID to get it right when inserting rows in df2
          rows_to_add += next_row_df2 
        else:
          break # exit this loop and add the lines to the dataframe
        k += 1
      df = pd.concat([df.iloc[:i], rows_to_add, df.iloc[i:]]).reset_index(drop=True)
      rows_to_add = pd.DataFrame()

但是A.没有添加行，B。效率不高（有人说it's even horrible）。也许我可以用一种更有效的方式做到这一点：仅在存在_的df2行上进行迭代？还是使用map-reduce？

我正在尝试：

for i, row1 in df.iterrows():
  for j, row2 in df2.iterrows():
    if row1['Questions'] == row2['Questions']:
      # here I want to test if the next row has _ in his QID
      # if so I add all the lines with the same QID before _ but with row1 QID
      k = 0
      L = []
      for _, next_row_df2 in df2[j+1:].iterrows():
        if "_" in str(next_row_df2['QID']):
          next_row_df2['QID'] = str(row1['QID']) + '_' + str(k) # but I need to change the QID to get it right when inserting rows in df2
          L.append(next_row_df2)
        else:
          break # exit this loop and add the lines to the dataframe
        k += 1
      if L:
        rows_to_add = pd.concat(L, ignore_index=True)
      df = pd.concat([df.iloc[:i], rows_to_add, df.iloc[i:]]).reset_index(drop=True)
      rows_to_add = pd.DataFrame()

但是还没有结果。

当它们与特定列的值匹配时，将多余的行从一个数据框添加到另一个数据框

0 个答案: