如何从Pandas数据帧的确定单元格值创建列表?

时间:2017-05-20 18:55:49

标签: python list pandas dataframe

从以下Pandas数据帧(实际上是距离矩阵):

web::websockets::client::websocket_close_status close_status

我尝试创建源自 foo foo bar bar spam spam foo 0.00 0.35 0.83 0.84 0.90 0.89 foo 0.35 0.00 0.86 0.85 0.92 0.91 bar 0.83 0.86 0.00 0.25 0.88 0.87 bar 0.84 0.85 0.25 0.00 0.82 0.86 spam 0.90 0.92 0.88 0.82 0.00 0.50 spam 0.89 0.91 0.87 0.86 0.50 0.00 的所有组合的列表,以获取具有唯一值的以下列表:

['foo','bar','spam']

我使用了df.get_values并且iterrows没有成功,而且这些答案How to get a value from a cell of a data frame?pandas: how to get scalar value on a cell using conditional indexing也没用。

有没有办法负担得起? 任何帮助将不胜感激

2 个答案:

答案 0 :(得分:2)

IIUC:

In [93]: from itertools import combinations

In [94]: s = pd.Series(df.values[np.triu_indices(len(df), 1)],
    ...:               index=pd.MultiIndex.from_tuples(tuple(combinations(df.index, 2))))
    ...:

In [95]: s
Out[95]:
foo   foo     0.35
      bar     0.83
      bar     0.84
      spam    0.90
      spam    0.89
      bar     0.86
      bar     0.85
      spam    0.92
      spam    0.91
bar   bar     0.25
      spam    0.88
      spam    0.87
      spam    0.82
      spam    0.86
spam  spam    0.50
dtype: float64

作为DF:

In [96]: s.reset_index(name='dist')
Out[96]:
   level_0 level_1  dist
0      foo     foo  0.35
1      foo     bar  0.83
2      foo     bar  0.84
3      foo    spam  0.90
4      foo    spam  0.89
5      foo     bar  0.86
6      foo     bar  0.85
7      foo    spam  0.92
8      foo    spam  0.91
9      bar     bar  0.25
10     bar    spam  0.88
11     bar    spam  0.87
12     bar    spam  0.82
13     bar    spam  0.86
14    spam    spam  0.50

答案 1 :(得分:2)

让我们进一步采取MaxU的解决方案(赞扬他的解决方案):

from itertools import combinations

s = pd.Series(df.values[np.triu_indices(len(df), 1)],
      index=pd.MultiIndex.from_tuples(tuple(combinations(df.index, 2))))

df_s = s.to_frame()

df_s.index = df_s.index.map('_'.join)

df_s.groupby(level=0)[0].apply(lambda x: x.tolist())

输出:

bar_bar                        [0.25]
bar_spam     [0.88, 0.87, 0.82, 0.86]
foo_bar      [0.83, 0.84, 0.86, 0.85]
foo_foo                        [0.35]
foo_spam      [0.9, 0.89, 0.92, 0.91]
spam_spam                       [0.5]
Name: 0, dtype: object

最后打印:

for i,v in df_out.iteritems():
    print(str(i) + ' = ' + str(v))

输出:

bar_bar = [0.25]
bar_spam = [0.88, 0.87, 0.82, 0.86]
foo_bar = [0.83, 0.84, 0.86, 0.85]
foo_foo = [0.35]
foo_spam = [0.9, 0.89, 0.92, 0.91]
spam_spam = [0.5]