使用pandas在矩阵中找到负耦合

时间:2016-07-31 12:35:48

标签: python pandas dataframe covariance

我在pandas中使用dataframe.cov()获得了一个协方差矩阵,现在我想找到所有最负面的一对(例如。v1v3是矩阵中最负的一对因为矩阵[v1v3]是最小的,然后在下一个选择阶段排除了v1v3,然后按升序排列,我想找到所有的加上这样的性格。

这是我的矩阵:

            V1           V2           V3           V4           V5
V1  471.189543   404.059694   -59.847099   415.214121  -571.672083
V2  404.059694  4768.251838  3740.331544   557.050042  1750.542728
V3  -59.847099  3740.331544  6563.713527  -782.229345  3378.108799
V4  415.214121   557.050042  -782.229345  1949.914430  -582.306412
V5 -571.672083  1750.542728  3378.108799  -582.306412  3355.819315

2 个答案:

答案 0 :(得分:2)

import numpy as np
import pandas as pd
df = pd.DataFrame({'V1': [471.18954309999998, 404.05969449999998, -59.847098750000001, 415.21412069999997, -571.67208340000002], 'V2': [404.05969449999998, 4768.2518380000001, 3740.3315439999997, 557.05004150000002, 1750.5427280000001], 'V3': [-59.847098750000001, 3740.3315439999997, 6563.7135269999999, -782.22934510000005, 3378.1087990000001], 'V4': [415.21412069999997, 557.05004150000002, -782.22934510000005, 1949.91443, -582.30641220000007], 'V5': [-571.67208340000002, 1750.5427280000001, 3378.1087990000001, -582.30641220000007, 3355.8193149999997]}, index=['V1', 'V2', 'V3', 'V4', 'V5'])

result = df.stack()
result.name = 'cov'
result = result.reset_index()
result = result.loc[result['level_0'] < result['level_1']]
result = result.sort_values(by='cov')
print(result)

产量

   level_0 level_1          cov
13      V3      V4  -782.229345
19      V4      V5  -582.306412
4       V1      V5  -571.672083
2       V1      V3   -59.847099
1       V1      V2   404.059694
3       V1      V4   415.214121
8       V2      V4   557.050042
9       V2      V5  1750.542728
14      V3      V5  3378.108799
7       V2      V3  3740.331544

行按协方差按升序排序。

答案 1 :(得分:0)

考虑数据框import numpy as np import pandas as pd rows = pd.Index(['V1', 'V2', 'V3', 'V4', 'V5'], name='rows') cols = pd.Index(['V1', 'V2', 'V3', 'V4', 'V5'], name='cols') df = pd.DataFrame( [[ 471.189543, 404.059694, -59.847099, 415.214121, -571.672083,], [ 404.059694, 4768.251838, 3740.331544, 557.050042, 1750.542728,], [ -59.847099, 3740.331544, 6563.713527, -782.229345, 3378.108799,], [ 415.214121, 557.050042, -782.229345, 1949.91443, -582.306412,], [ -571.672083, 1750.542728, 3378.108799, -582.306412, 3355.819315,]], rows, cols )

numpy

使用pos = df.values.argmin() rpos = rows[pos // len(cols)] cpos = cols[pos % len(rows)] print rpos, cpos, df.loc[rpos, cpos] V3 V4 -782.229345 更有效地找到解决方案

找到最小值的行和列。

ti = np.triu_indices(len(rows), 1, len(cols))
argsorted = df.values[ti[0], ti[1]].argsort()
tups = zip(rows[ti[0][argsorted]], cols[ti[1][argsorted]])

df.stack().loc[tups].rename('cov').reset_index()

对整个数据框进行排序

<Window xmlns:chart="clr-namespace:System.Windows.Controls.DataVisualization.Charting;assembly=System.Windows.Controls.DataVisualization.Toolkit"

x:Class="GRAPHPrototype.MainWindow"
    xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
    xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
    xmlns:d="http://schemas.microsoft.com/expression/blend/2008"
    xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
    xmlns:local="clr-namespace:GRAPHPrototype"
    mc:Ignorable="d"
    Title="MainWindow" Height="350" Width="525">
<Grid>       
    <chart:Chart Name="chart1">            
    </chart:Chart>
</Grid>

enter image description here