Question

假设我有一个像这样的pandas数据框：

df = pd.DataFrame([['Apple', 'Orange', 'Peach'],
                   ['Apple', 'Lemon', 'Lime'],
                   ['Starfruit', 'Apple', 'Orange']],
                  columns=['Fruit_1', 'Fruit_2', 'Fruit_3'])

可重复的形式：

Apple, Orange
Apple, Peach
Orange, Peach
Apple, Lemon
Apple, Lime
Lemon, Lime
Starfruit, Apple
Starfruit, Orange
Apple, Orange

我想生成一个边缘列表，其中包括：

n=4
c=[]
while n!=0:
    c.append(n)
    n-=1  
print c

我如何用Python做到这一点？

Answer 1

我不知道大熊猫，但你可以在行上使用itertools.combinations

itertools.combinations(row, 2)

这会创建一个迭代器，您只需将其转换为对列表即可。

将这些列表收集到列表后加入这些列表可以使用平面列表理解来完成

[pair for row in collected_rows for pair in row]

或者使用通常快得多的numpy方式

data[:, np.c_[np.tril_indices(data.shape[1], -1)]]

如果你想要一个单一的列表

data[:, np.c_[np.triu_indices(data.shape[1], 1)]].reshape(-1,2)

请注意triu_indices按顺序列出顶点，而tril_indices则反过来列出顶点。它们通常用于获取矩阵的上三角或下三角的索引。

Answer 2

这是一个熊猫解决方案：

In [118]: from itertools import combinations

In [119]: df.apply(lambda x: list(combinations(x, 2)), 1).stack().reset_index(level=[0,1], drop=True).apply(', '.join)
Out[119]:
0        Apple, Orange
1         Apple, Peach
2        Orange, Peach
3         Apple, Lemon
4          Apple, Lime
5          Lemon, Lime
6     Starfruit, Apple
7    Starfruit, Orange
8        Apple, Orange
dtype: object

从pandas数据帧生成边缘列表

2 个答案: