我想就如何解决这个问题发表意见,因为我认为这很复杂。因此,我在此列中有一个熊猫数据框:
我必须在ID列中突出显示“名称”列。例如,我只想匹配ID> = 2的名称。因此,在这种情况下,预期的输出将是这样:
我考虑过定义关键字为name_sample的字典:
pd_data={'Name_sample':1,'ID':[1,2,3],'Type':[1.1,1.2,1.3],'Name'=['Dog','cat','Dog.3'],
'Name_sample_1':1.1,'ID':[1,2,3],'Type':[1.1,1.2,1.3],'Name_1'=['cat','cat.1','Dog.1'],
'Name_sample_2':1.2,'ID':[1,2,4],'Type':[1.1,1.2,1.2],'Name_2'=['cat.7','cat.1','Dog.3']}
type_1=pd_data.set_index('Name_samples')[['ID', 'Type', 'Name']].T.to_dict('dict')
type_2=pd_data.set_index('Name_samples_1')[['ID_1', 'Type_1', 'Name_1']].T.to_dict('dict'))
type_1=pd_data.set_index('Name_samples_2')[['ID', 'Type_2', 'Name_2']].T.to_dict('dict'))
for first in type_1.keys():
values_1=type_1[first ]
if values_1['ID']>2:
values_1_bigger=values_1
for second in type_2.keys():
values_2=type_2[second ]
if values_2['ID_1']>2:
values_2_bigger = values_2
for values_3 in type_3.keys():
values_3=type_3[values_3 ]
if values_3['ID_2']>2:
values_3_bigger = values_3
但是现在我不知道如何进行或如何进行...有人可以告诉我可行的解决方案吗?我只需要一个指导。谢谢!
答案 0 :(得分:0)
使用:
pd_data={'Name_sample':1,'ID':[1,2,3],'Type':[1.1,1.2,1.3],'Name':['Dog','cat','Dog.3'],
'Name_sample_1':1.1,'ID_1':[1,2,3],'Type_1':[1.1,1.2,1.3],'Name_1':['cat','cat.1','Dog.1'],
'Name_sample_2':1.2,'ID_2':[1,2,4],'Type_2':[1.1,1.2,1.2],'Name_2':['cat.7','cat.1','Dog.3']}
pd_data = pd.DataFrame(pd_data)
print (pd_data)
Name_sample ID Type Name Name_sample_1 ID_1 Type_1 Name_1 \
0 1 1 1.1 Dog 1.1 1 1.1 cat
1 1 2 1.2 cat 1.1 2 1.2 cat.1
2 1 3 1.3 Dog.3 1.1 3 1.3 Dog.1
Name_sample_2 ID_2 Type_2 Name_2
0 1.2 1 1.1 cat.7
1 1.2 2 1.2 cat.1
2 1.2 4 1.2 Dog.3
def color(x):
c1 = 'background-color: yellow'
c = ''
#get ID columns
ids = x[['ID','ID_1','ID_2']]
#reshape and filter for greater or equal 2
m = ids.stack(dropna=False).ge(2)
#get Name column (same number like ID), filter by mask
names = x[['Name','Name_1','Name_2']].stack(dropna=False)[m.values]
#final mask - only duplicated values
mask = (names.duplicated(keep=False)
.unstack()
.reindex(index=x.index, columns=x.columns, fill_value=False))
#DataFrame with same index and columns names as original filled empty strings
df1 = pd.DataFrame(c, index=x.index, columns=x.columns)
#modify values of df1 column by boolean mask
return df1.mask(mask, c1)
pd_data.style.apply(color, axis=None).to_excel('df.xlsx', engine='openpyxl', index=False)