我有a first dataframe,必须在其中添加a second dataframe的行。
这个或多或少像第一个:
QID Questions B Answer1 Answer2 Answer3 F G H I J
0 3 a 4.0 a a a a e g i l
1 4 b 5.0 b b b a r h m p
2 5 d 5.0 NaN e d b u e i z
3 6 e 5.0 d h r b c z i 3
...
第二个:
QID Questions B Answer1 Answer2 Answer3 F ...
0 1 a 4.0 a a a a
1 2 b 5.0 b k b a
2 2_1 z 5.0 b k b a
3 2_2 w 4.0 b k b c
4 3 d 5.0 NaN e d b
5 4 e 5.0 d h r b
...
我想要得到:
QID Questions B Answer1 Answer2 Answer3 F G H I J
0 3 a 4.0 a a a a e g i l
1 4 b 5.0 b b b a r h m p
2 4_1 z 5.0 b k b a r h m p
3 4_2 w 4.0 b k b c r h m p
4 5 d 5.0 NaN e d b u e i z
5 6 e 5.0 d h r b c z i 3
...
如您所见,数据框共享问题b
,因此我在新数据框中添加了以下行,其中包括_
。
从字面上看,这意味着第一数据表和第二数据表在“答案”列的单元格中共享相同的“ t1”和“ t2”文本。但是对于给定组合(t1,t2),其中t1 == t2
,当它下面还有行时,QID具有_
,那么我想在输入行之后添加这些行内。
我已经开始:
rows_to_add = pd.DataFrame()
for i, row1 in df.iterrows():
for j, row2 in df2.iterrows():
if row1['Questions'] == row2['Questions']:
# here I want to test if the next row has _ in his QID
# if so I add all the lines with the same QID before _ but with row1 QID
k = 0
for _, next_row_df2 in df2[j+1:].iterrows():
if "_" in str(next_row_df2['QID']):
next_row_df2['QID'] = str(row1['QID']) + '_' + str(k) # but I need to change the QID to get it right when inserting rows in df2
rows_to_add += next_row_df2
else:
break # exit this loop and add the lines to the dataframe
k += 1
df = pd.concat([df.iloc[:i], rows_to_add, df.iloc[i:]]).reset_index(drop=True)
rows_to_add = pd.DataFrame()
但是A.没有添加行,B。效率不高(有人说it's even horrible)。也许我可以用一种更有效的方式做到这一点:仅在存在_
的df2行上进行迭代?还是使用map-reduce?
我正在尝试:
for i, row1 in df.iterrows():
for j, row2 in df2.iterrows():
if row1['Questions'] == row2['Questions']:
# here I want to test if the next row has _ in his QID
# if so I add all the lines with the same QID before _ but with row1 QID
k = 0
L = []
for _, next_row_df2 in df2[j+1:].iterrows():
if "_" in str(next_row_df2['QID']):
next_row_df2['QID'] = str(row1['QID']) + '_' + str(k) # but I need to change the QID to get it right when inserting rows in df2
L.append(next_row_df2)
else:
break # exit this loop and add the lines to the dataframe
k += 1
if L:
rows_to_add = pd.concat(L, ignore_index=True)
df = pd.concat([df.iloc[:i], rows_to_add, df.iloc[i:]]).reset_index(drop=True)
rows_to_add = pd.DataFrame()
但是还没有结果。