根据其他列突出显示pandas数据框中的特定列

时间:2020-02-27 11:31:32

标签: python pandas

我想就如何解决这个问题发表意见,因为我认为这很复杂。因此,我在此列中有一个熊猫数据框:

enter image description here

我必须在ID列中突出显示“名称”列。例如,我只想匹配ID> = 2的名称。因此,在这种情况下,预期的输出将是这样:

![enter image description here

我考虑过定义关键字为name_sample的字典:

pd_data={'Name_sample':1,'ID':[1,2,3],'Type':[1.1,1.2,1.3],'Name'=['Dog','cat','Dog.3'],
'Name_sample_1':1.1,'ID':[1,2,3],'Type':[1.1,1.2,1.3],'Name_1'=['cat','cat.1','Dog.1'],
'Name_sample_2':1.2,'ID':[1,2,4],'Type':[1.1,1.2,1.2],'Name_2'=['cat.7','cat.1','Dog.3']}

 type_1=pd_data.set_index('Name_samples')[['ID', 'Type', 'Name']].T.to_dict('dict')
type_2=pd_data.set_index('Name_samples_1')[['ID_1', 'Type_1', 'Name_1']].T.to_dict('dict'))
type_1=pd_data.set_index('Name_samples_2')[['ID', 'Type_2', 'Name_2']].T.to_dict('dict'))

for first in type_1.keys():
    values_1=type_1[first ]
    if values_1['ID']>2:
        values_1_bigger=values_1
for second in type_2.keys():
    values_2=type_2[second ]
    if values_2['ID_1']>2:
        values_2_bigger = values_2
for values_3 in type_3.keys():
    values_3=type_3[values_3 ]
    if values_3['ID_2']>2:
        values_3_bigger = values_3

但是现在我不知道如何进行或如何进行...有人可以告诉我可行的解决方案吗?我只需要一个指导。谢谢!

1 个答案:

答案 0 :(得分:0)

使用:

pd_data={'Name_sample':1,'ID':[1,2,3],'Type':[1.1,1.2,1.3],'Name':['Dog','cat','Dog.3'],
'Name_sample_1':1.1,'ID_1':[1,2,3],'Type_1':[1.1,1.2,1.3],'Name_1':['cat','cat.1','Dog.1'],
'Name_sample_2':1.2,'ID_2':[1,2,4],'Type_2':[1.1,1.2,1.2],'Name_2':['cat.7','cat.1','Dog.3']}


pd_data = pd.DataFrame(pd_data)
print (pd_data)
   Name_sample  ID  Type   Name  Name_sample_1  ID_1  Type_1 Name_1  \
0            1   1   1.1    Dog            1.1     1     1.1    cat   
1            1   2   1.2    cat            1.1     2     1.2  cat.1   
2            1   3   1.3  Dog.3            1.1     3     1.3  Dog.1   

   Name_sample_2  ID_2  Type_2 Name_2  
0            1.2     1     1.1  cat.7  
1            1.2     2     1.2  cat.1  
2            1.2     4     1.2  Dog.3  

def color(x):
    c1 = 'background-color: yellow'
    c = '' 
    #get ID columns
    ids = x[['ID','ID_1','ID_2']]
    #reshape and filter for greater or equal 2
    m = ids.stack(dropna=False).ge(2)
    #get Name column (same number like ID), filter by mask
    names = x[['Name','Name_1','Name_2']].stack(dropna=False)[m.values]
    #final mask - only duplicated values
    mask = (names.duplicated(keep=False)
                 .unstack()
                 .reindex(index=x.index, columns=x.columns, fill_value=False))
    #DataFrame with same index and columns names as original filled empty strings
    df1 =  pd.DataFrame(c, index=x.index, columns=x.columns)
    #modify values of df1 column by boolean mask
    return df1.mask(mask, c1)

pd_data.style.apply(color, axis=None).to_excel('df.xlsx', engine='openpyxl', index=False)