从距离矩阵数据框中查找最小值的成对标签

时间:2019-10-04 09:37:25

标签: python pandas dataframe distance-matrix

我有一个from azure.cosmosdb.table.tableservice import TableService service = TableService(connection_string='xxxxx') table_name = 'yyyy' rows = service.query_entities(table_name, "PartitionKey lt '2014-07-01'") for row in rows: # do sth with row 的{​​{1}} scipy

如何从数据框中提取每行的最小值(不包括0.00)以及与该值相关的(行,列)标签?

例如:

第一行的distance_matrix将是dataframe

第二行的min将是[0.012885,'king','boy']

min的代码:

[2.826742,'wise','bananas']

输出:

DataFrame

我尝试了以下操作(仍然需要附加关联的值):

import scipy
...
 df = pd.DataFrame(scipy.spatial.distance_matrix(w2v_df[['x1', 'x2']], 
                                                          w2v_df[['x1', 'x2']]),
                           index=w2v_df['word'],
                           columns=w2v_df['word'])
print(df)
print(df.size)

退出:

<class 'pandas.core.frame.DataFrame'>
word            king       wise      queen  ...       kind        man        boy
word                                        ...                                 
king        0.000000   7.917140  10.963772  ...   5.811759   3.180582   0.012885
wise        7.917140   0.000000   6.642557  ...  10.990575   9.957878   7.908536
queen      10.963772   6.642557   0.000000  ...  10.347096  11.126121  10.951130
trees       9.954951   3.937842   2.917539  ...  10.940161  10.948519   9.943392
lab         7.437203  11.811392  10.148030  ...   1.716404   4.612150   7.429358
prince      3.180829   9.958469  11.126762  ...   2.897802   0.000654   3.177194
monkeys    10.007491   3.958035   2.926149  ...  10.995299  11.004550   9.995942
girl        5.820748   5.026462   5.153798  ...   6.336225   6.244742   5.808014
woman      10.663214   8.143587   2.350959  ...   8.843283  10.155728  10.650332
princess    5.204497   5.744348   5.894201  ...   5.439997   5.356606   5.191617
cat         3.033364   5.678351  10.397241  ...   8.359144   6.077646   3.031699
dinosaurs   5.745362   6.422390   5.683175  ...   5.075057   5.442950   5.732531
person      9.421978  10.901532   7.192433  ...   5.081030   7.477618   9.410744
bananas     5.238502   2.826742   8.147972  ...   9.239873   7.668165   5.231329
partner     7.752175  10.135952   7.572307  ...   3.468261   5.742199   7.741316
rat         8.830544   8.633246   4.739600  ...   6.113317   7.734904   8.818027
kind        5.811759  10.990575  10.347096  ...   0.000000   2.897668   5.804801
man         3.180582   9.957878  11.126121  ...   2.897668   0.000000   3.176944
boy         0.012885   7.908536  10.951130  ...   5.804801   3.176944   0.000000

[19 rows x 19 columns]

1 个答案:

答案 0 :(得分:1)

请注意,距离矩阵是对称的。因此您可以在每个示例的每个示例中仅使用dataframe.sort_value(by='king')。并使用.iloc[:,1]。或者,您可以只使用min函数并将其存储在列表中。

我做到了这一点,对于看起来像您的小数据框来说效果很好。

     df = df.replace(0,99999) /// # OR df.replace(0,999,inplace = True)
     #get the min for per example the king
     min_king = df.king.min()
     [min_king,'king', df[df['king']==min_king].index.values[0]]

然后在该块上循环以获取所有索引