如何从df获取数据并将其放置在单元级别的另一个df - pandas

时间:2018-03-05 12:29:10

标签: python python-3.x pandas

我有2个数据帧df_criterias和df_tofill。

df_criterias

     goto_emptycol1     goto_emptycol2    data1     data2
0    some value1        another value1    a         val1
1    some value2        another value2    b         val2
2    some value3        another value3    c         val3
3    some value4        another value4    d         val4
4    some value5        another value5    e         val5
5    some value6        another value6    f         val6
6    some value7        another value7    g         val7

df_tofill

     emptycol1          emptycol2         data1     data2
0                                         f         val6
1                                         nok       nok
2                                         nok       nok
3                                         a         val1
4                                         nok       nok
5                                         g         val7
6                                         d         val4

expected_results

     emptycol1          emptycol2         data1     data2
0    some value6        another value6    f         val6
1                                         nok       nok
2                                         nok       nok
3    some value1        another value1    a         val1
4                                         nok       nok
5    some value7        another value7    g         val7
6    some value4        another value4    d         val4

从两者中我创建了两个带索引的列表(其中dfs,列“data1”,“data2” - 匹配的一些标准)

list_fill = [0,3,5,6] #from df_tofill
list_crt = [5,0,6,3] #from df_criterias

其中list_crt [0]元素5与list_fill [0]元素0匹配。

要做出expected_results,我正在尝试这个:

for i, icrt in enumerate(list_crt):
        #Get the value
        val1 = df_criterias.loc[icrt,"goto_emptycol1"]
        val2 = df_criterias.loc[icrt,"goto_emptycol2"]
        #Set the value
        df_tofill.loc[list_fill[i], "emptycol1"] = val1
        df_tofill.loc[list_fill[i], "emptycol2"] = val2

我正在努力获得“expected_results”df。算法是否正确?

更新: 管理使其工作 - .at给了我一些奇怪的错误我用.loc替换它。在使用索引创建列表之前,需要.reset_index()。

使用以下方法创建索引列表:

def common_elements(crtlist, radlist):
    #where crtlist is all criterias and radlist all to be checked
    #returns 2 lists with indexes where elements where a match
    crtli_idx = []
    radli_idx = []
    for idx1, crt in enumerate(crtlist):
        for idx2, rad in enumerate(radlist):
            if rad.startswith(crt):
                crtli_idx.append(idx1)
                radli_idx.append(idx2)    
    return crtli_idx, radli_idx


crtlist = ['1', '21', '444']
radlist = ['asda','aererv','1vrvssq','4447676767']
idxcrt, ixdrad = common_elements(crtlist, radlist)
print(idxcrt, ixdrad)
OUT:
[0, 2] [2, 3]

1 个答案:

答案 0 :(得分:0)

一种方法是对齐索引/列,在目标数据框中将''替换为np.nan,然后通过.loc将一个数据框分配给另一个。

df_criterias = df_criterias.rename(columns={'goto_emptycol1': 'emptycol1',
                                            'goto_emptycol2': 'emptycol2'})\
                           .set_index(['data1', 'data2'])

df_tofill = df_tofill.replace('', np.nan)\
                     .set_index(['data1', 'data2']) 

df_tofill.loc[:] = df_criterias.loc[df_criterias.index.isin(df_tofill.index)]
df_tofill = df_tofill.reset_index()

#   data1 data2   emptycol1      emptycol2
# 0     f  val6  somevalue6  anothervalue6
# 1   nok   nok         NaN            NaN
# 2   nok   nok         NaN            NaN
# 3     a  val1  somevalue1  anothervalue1
# 4   nok   nok         NaN            NaN
# 5     g  val7  somevalue7  anothervalue7
# 6     d  val4  somevalue4  anothervalue4