我在pandas中使用dataframe.cov()
获得了一个协方差矩阵,现在我想找到所有最负面的一对(例如。v1
和v3
是矩阵中最负的一对因为矩阵[v1
,v3
]是最小的,然后在下一个选择阶段排除了v1
和v3
,然后按升序排列,我想找到所有的加上这样的性格。
这是我的矩阵:
V1 V2 V3 V4 V5
V1 471.189543 404.059694 -59.847099 415.214121 -571.672083
V2 404.059694 4768.251838 3740.331544 557.050042 1750.542728
V3 -59.847099 3740.331544 6563.713527 -782.229345 3378.108799
V4 415.214121 557.050042 -782.229345 1949.914430 -582.306412
V5 -571.672083 1750.542728 3378.108799 -582.306412 3355.819315
答案 0 :(得分:2)
import numpy as np
import pandas as pd
df = pd.DataFrame({'V1': [471.18954309999998, 404.05969449999998, -59.847098750000001, 415.21412069999997, -571.67208340000002], 'V2': [404.05969449999998, 4768.2518380000001, 3740.3315439999997, 557.05004150000002, 1750.5427280000001], 'V3': [-59.847098750000001, 3740.3315439999997, 6563.7135269999999, -782.22934510000005, 3378.1087990000001], 'V4': [415.21412069999997, 557.05004150000002, -782.22934510000005, 1949.91443, -582.30641220000007], 'V5': [-571.67208340000002, 1750.5427280000001, 3378.1087990000001, -582.30641220000007, 3355.8193149999997]}, index=['V1', 'V2', 'V3', 'V4', 'V5'])
result = df.stack()
result.name = 'cov'
result = result.reset_index()
result = result.loc[result['level_0'] < result['level_1']]
result = result.sort_values(by='cov')
print(result)
产量
level_0 level_1 cov
13 V3 V4 -782.229345
19 V4 V5 -582.306412
4 V1 V5 -571.672083
2 V1 V3 -59.847099
1 V1 V2 404.059694
3 V1 V4 415.214121
8 V2 V4 557.050042
9 V2 V5 1750.542728
14 V3 V5 3378.108799
7 V2 V3 3740.331544
行按协方差按升序排序。
答案 1 :(得分:0)
考虑数据框import numpy as np
import pandas as pd
rows = pd.Index(['V1', 'V2', 'V3', 'V4', 'V5'], name='rows')
cols = pd.Index(['V1', 'V2', 'V3', 'V4', 'V5'], name='cols')
df = pd.DataFrame(
[[ 471.189543, 404.059694, -59.847099, 415.214121, -571.672083,],
[ 404.059694, 4768.251838, 3740.331544, 557.050042, 1750.542728,],
[ -59.847099, 3740.331544, 6563.713527, -782.229345, 3378.108799,],
[ 415.214121, 557.050042, -782.229345, 1949.91443, -582.306412,],
[ -571.672083, 1750.542728, 3378.108799, -582.306412, 3355.819315,]],
rows, cols
)
numpy
pos = df.values.argmin()
rpos = rows[pos // len(cols)]
cpos = cols[pos % len(rows)]
print rpos, cpos, df.loc[rpos, cpos]
V3 V4 -782.229345
更有效地找到解决方案找到最小值的行和列。
ti = np.triu_indices(len(rows), 1, len(cols))
argsorted = df.values[ti[0], ti[1]].argsort()
tups = zip(rows[ti[0][argsorted]], cols[ti[1][argsorted]])
df.stack().loc[tups].rename('cov').reset_index()
对整个数据框进行排序
<Window xmlns:chart="clr-namespace:System.Windows.Controls.DataVisualization.Charting;assembly=System.Windows.Controls.DataVisualization.Toolkit"
x:Class="GRAPHPrototype.MainWindow"
xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
xmlns:d="http://schemas.microsoft.com/expression/blend/2008"
xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
xmlns:local="clr-namespace:GRAPHPrototype"
mc:Ignorable="d"
Title="MainWindow" Height="350" Width="525">
<Grid>
<chart:Chart Name="chart1">
</chart:Chart>
</Grid>