我有一个矩阵,它通过我创建的DF中的代码来计算两组学科之间的链接数量:
new_df = df[['GrantRefNumber','Subject']]
a = ['Psychology','Education','Social policy','Sociology','Pol. sci. & internat. studies','Development studies','Social anthropology','Area Studies','Science and Technology Studies','Law & legal studies','Economics','Management & business studies','Human Geography','Environmental planning','Demography','Social work','Tools, technologies & methods','Linguistics','History']
final_df = new_df[new_df['Subject'].isin(a)]
ctrs = {location: Counter(gp.GrantRefNumber) for location, gp in final_df.groupby('Subject')}
ctrs = list(ctrs.items())
overlaps = [(loc1, loc2, sum(min(ctr1[k], ctr2[k]) for k in ctr1))
for i, (loc1, ctr1) in enumerate(ctrs, start=1)
for (loc2, ctr2) in ctrs[i:] if loc1 != loc2]
overlaps += [(l2, l1, c) for l1, l2, c in overlaps]
df2 = pd.DataFrame(overlaps, columns=['Loc1', 'Loc2', 'Count'])
df2 = df2.set_index(['Loc1', 'Loc2'])
df2 = df2.unstack().fillna(0).astype(int)
矩阵看起来像这样(它非常大,所以拍了部分照片:
我稍后在代码中将矩阵转换为和弦图,但想要一种方法来过滤(或移动到新的DF)数据只显示前20(或任何数字,所以我可以用一个更改它变量稍后变量)矩阵中的最高数字,然后为其他所有内容添加0。
有一种简单的方法吗?
答案 0 :(得分:1)
df.sort_values(by='AreaStudies',ascending=False).head(20)
答案 1 :(得分:1)
您可以使用:
df = pd.DataFrame({'B':[4,5,4,5,5,4],
'C':[7,8,9,4,2,3],
'D':[1,3,5,7,1,0],
'E':[5,3,6,9,2,4]})
print (df)
B C D E
0 4 7 1 5
1 5 8 3 3
2 4 9 5 6
3 5 4 7 9
4 5 2 1 2
5 4 3 0 4
您可以先创建顶级唯一值,然后DataFrame.mask
创建isin
条件:
a = np.sort(np.unique(df.values.ravel()))[-3:]
print (a)
[7 8 9]
df = df.where(df.isin(a), 0)
print (df)
B C D E
0 0 7 0 0
1 0 8 0 0
2 0 9 0 0
3 0 0 7 9
4 0 0 0 0
5 0 0 0 0