从以下Pandas数据帧(实际上是距离矩阵):
web::websockets::client::websocket_close_status close_status
我尝试创建源自 foo foo bar bar spam spam
foo 0.00 0.35 0.83 0.84 0.90 0.89
foo 0.35 0.00 0.86 0.85 0.92 0.91
bar 0.83 0.86 0.00 0.25 0.88 0.87
bar 0.84 0.85 0.25 0.00 0.82 0.86
spam 0.90 0.92 0.88 0.82 0.00 0.50
spam 0.89 0.91 0.87 0.86 0.50 0.00
的所有组合的列表,以获取具有唯一值的以下列表:
['foo','bar','spam']
我使用了df.get_values并且iterrows没有成功,而且这些答案How to get a value from a cell of a data frame?和pandas: how to get scalar value on a cell using conditional indexing也没用。
有没有办法负担得起? 任何帮助将不胜感激
答案 0 :(得分:2)
IIUC:
In [93]: from itertools import combinations
In [94]: s = pd.Series(df.values[np.triu_indices(len(df), 1)],
...: index=pd.MultiIndex.from_tuples(tuple(combinations(df.index, 2))))
...:
In [95]: s
Out[95]:
foo foo 0.35
bar 0.83
bar 0.84
spam 0.90
spam 0.89
bar 0.86
bar 0.85
spam 0.92
spam 0.91
bar bar 0.25
spam 0.88
spam 0.87
spam 0.82
spam 0.86
spam spam 0.50
dtype: float64
作为DF:
In [96]: s.reset_index(name='dist')
Out[96]:
level_0 level_1 dist
0 foo foo 0.35
1 foo bar 0.83
2 foo bar 0.84
3 foo spam 0.90
4 foo spam 0.89
5 foo bar 0.86
6 foo bar 0.85
7 foo spam 0.92
8 foo spam 0.91
9 bar bar 0.25
10 bar spam 0.88
11 bar spam 0.87
12 bar spam 0.82
13 bar spam 0.86
14 spam spam 0.50
答案 1 :(得分:2)
让我们进一步采取MaxU的解决方案(赞扬他的解决方案):
from itertools import combinations
s = pd.Series(df.values[np.triu_indices(len(df), 1)],
index=pd.MultiIndex.from_tuples(tuple(combinations(df.index, 2))))
df_s = s.to_frame()
df_s.index = df_s.index.map('_'.join)
df_s.groupby(level=0)[0].apply(lambda x: x.tolist())
输出:
bar_bar [0.25]
bar_spam [0.88, 0.87, 0.82, 0.86]
foo_bar [0.83, 0.84, 0.86, 0.85]
foo_foo [0.35]
foo_spam [0.9, 0.89, 0.92, 0.91]
spam_spam [0.5]
Name: 0, dtype: object
最后打印:
for i,v in df_out.iteritems():
print(str(i) + ' = ' + str(v))
输出:
bar_bar = [0.25]
bar_spam = [0.88, 0.87, 0.82, 0.86]
foo_bar = [0.83, 0.84, 0.86, 0.85]
foo_foo = [0.35]
foo_spam = [0.9, 0.89, 0.92, 0.91]
spam_spam = [0.5]