我有2个数据帧df_criterias和df_tofill。
df_criterias
goto_emptycol1 goto_emptycol2 data1 data2
0 some value1 another value1 a val1
1 some value2 another value2 b val2
2 some value3 another value3 c val3
3 some value4 another value4 d val4
4 some value5 another value5 e val5
5 some value6 another value6 f val6
6 some value7 another value7 g val7
df_tofill
emptycol1 emptycol2 data1 data2
0 f val6
1 nok nok
2 nok nok
3 a val1
4 nok nok
5 g val7
6 d val4
expected_results
emptycol1 emptycol2 data1 data2
0 some value6 another value6 f val6
1 nok nok
2 nok nok
3 some value1 another value1 a val1
4 nok nok
5 some value7 another value7 g val7
6 some value4 another value4 d val4
从两者中我创建了两个带索引的列表(其中dfs,列“data1”,“data2” - 匹配的一些标准)
list_fill = [0,3,5,6] #from df_tofill
list_crt = [5,0,6,3] #from df_criterias
其中list_crt [0]元素5与list_fill [0]元素0匹配。
要做出expected_results,我正在尝试这个:
for i, icrt in enumerate(list_crt):
#Get the value
val1 = df_criterias.loc[icrt,"goto_emptycol1"]
val2 = df_criterias.loc[icrt,"goto_emptycol2"]
#Set the value
df_tofill.loc[list_fill[i], "emptycol1"] = val1
df_tofill.loc[list_fill[i], "emptycol2"] = val2
我正在努力获得“expected_results”df。算法是否正确?
更新: 管理使其工作 - .at给了我一些奇怪的错误我用.loc替换它。在使用索引创建列表之前,需要.reset_index()。
使用以下方法创建索引列表:
def common_elements(crtlist, radlist):
#where crtlist is all criterias and radlist all to be checked
#returns 2 lists with indexes where elements where a match
crtli_idx = []
radli_idx = []
for idx1, crt in enumerate(crtlist):
for idx2, rad in enumerate(radlist):
if rad.startswith(crt):
crtli_idx.append(idx1)
radli_idx.append(idx2)
return crtli_idx, radli_idx
crtlist = ['1', '21', '444']
radlist = ['asda','aererv','1vrvssq','4447676767']
idxcrt, ixdrad = common_elements(crtlist, radlist)
print(idxcrt, ixdrad)
OUT:
[0, 2] [2, 3]
答案 0 :(得分:0)
一种方法是对齐索引/列,在目标数据框中将''
替换为np.nan
,然后通过.loc
将一个数据框分配给另一个。
df_criterias = df_criterias.rename(columns={'goto_emptycol1': 'emptycol1',
'goto_emptycol2': 'emptycol2'})\
.set_index(['data1', 'data2'])
df_tofill = df_tofill.replace('', np.nan)\
.set_index(['data1', 'data2'])
df_tofill.loc[:] = df_criterias.loc[df_criterias.index.isin(df_tofill.index)]
df_tofill = df_tofill.reset_index()
# data1 data2 emptycol1 emptycol2
# 0 f val6 somevalue6 anothervalue6
# 1 nok nok NaN NaN
# 2 nok nok NaN NaN
# 3 a val1 somevalue1 anothervalue1
# 4 nok nok NaN NaN
# 5 g val7 somevalue7 anothervalue7
# 6 d val4 somevalue4 anothervalue4