Python矩阵 - 将矩阵限制在前20位

时间:2017-09-12 09:04:38

标签: python pandas

我有一个矩阵,它通过我创建的DF中的代码来计算两组学科之间的链接数量:

new_df = df[['GrantRefNumber','Subject']]

a = ['Psychology','Education','Social policy','Sociology','Pol. sci. & internat. studies','Development studies','Social anthropology','Area Studies','Science and Technology Studies','Law & legal studies','Economics','Management & business studies','Human Geography','Environmental planning','Demography','Social work','Tools, technologies & methods','Linguistics','History']
final_df = new_df[new_df['Subject'].isin(a)]

ctrs = {location: Counter(gp.GrantRefNumber) for location, gp in final_df.groupby('Subject')}

ctrs = list(ctrs.items())
overlaps = [(loc1, loc2, sum(min(ctr1[k], ctr2[k]) for k in ctr1))
    for i, (loc1, ctr1) in enumerate(ctrs, start=1)
    for (loc2, ctr2) in ctrs[i:] if loc1 != loc2]
overlaps += [(l2, l1, c) for l1, l2, c in overlaps]

df2 = pd.DataFrame(overlaps, columns=['Loc1', 'Loc2', 'Count'])
df2 = df2.set_index(['Loc1', 'Loc2'])
df2 = df2.unstack().fillna(0).astype(int)

矩阵看起来像这样(它非常大,所以拍了部分照片:

enter image description here

我稍后在代码中将矩阵转换为和弦图,但想要一种方法来过滤(或移动到新的DF)数据只显示前20(或任何数字,所以我可以用一个更改它变量稍后变量)矩阵中的最高数字,然后为其他所有内容添加0。

有一种简单的方法吗?

2 个答案:

答案 0 :(得分:1)

df.sort_values(by='AreaStudies',ascending=False).head(20)

答案 1 :(得分:1)

您可以使用:

df = pd.DataFrame({'B':[4,5,4,5,5,4],
                   'C':[7,8,9,4,2,3],
                   'D':[1,3,5,7,1,0],
                   'E':[5,3,6,9,2,4]})

print (df)
   B  C  D  E
0  4  7  1  5
1  5  8  3  3
2  4  9  5  6
3  5  4  7  9
4  5  2  1  2
5  4  3  0  4

您可以先创建顶级唯一值,然后DataFrame.mask创建isin条件:

a = np.sort(np.unique(df.values.ravel()))[-3:]
print (a)
[7 8 9]


df = df.where(df.isin(a), 0)
print (df)
   B  C  D  E
0  0  7  0  0
1  0  8  0  0
2  0  9  0  0
3  0  0  7  9
4  0  0  0  0
5  0  0  0  0