我有两个文件如下:
文件1:
A1 A2 description
1 10 apo_descriptorX_0001
4 52 apo_descriptorY_0001
30 1 apo_descriptorZ_0001
20 10 apo_descriptorX_0002
1 30 apo_descriptorX_0003
2 4 apo_descriptorY_0002
文件2:
A1 A2 description
1 10 holo_descriptorX_0001
4 52 holo_descriptorY_0001
30 1 holo_descriptorZ_0001
20 10 holo_descriptorX_0002
1 30 holo_descriptorX_0003
2 4 holo_descriptorY_0002
我想为每个描述符类型绘制值A1和A2的频率。因此,描述符X的每个值A1应该出现在关于其最终数字(0001,0002等)的频率图中。
我和朋友如何解决:
names=set(i[13:-5] for i in holo_data['description'])
#define variable "names" with the portion of the description you want to compare.
#In this case all the characters from 13 up to the final less 5 in the holo_data dataset.
for i in names:
apo_i =("apo_")+(i)
holo_i = ("holo_")+(i)
fig1,ax1= plt.subplots(1,figsize=(10,5))
sns.distplot(apo_data[apo_data['description'].str.contains(apo_i)]['A2'],ax=ax1,label='Apo')
sns.distplot(holo_data[holo_data['description'].str.contains(holo_i)]['A1'],ax=ax1,label='Holo')
ax1.legend()
plt.title(i)
ax1.set_ylabel('y', fontsize=12)
ax1.set_xlabel(r 'x', fontsize=20)
plt.show()`
;)