Question

我们说我有一个值列表，

(n_sample, dimension)

我还有一个形式为lst=['orange','apple','banana', 'grape', 'lemon']的pandas数据框：

df

行是lst中所有成对组合的子集。请注意，每个组合最多只出现一次。

我想要的是一个新的数据框，其余的组合用值0填充。

例如，Source Destination Weight orange apple 0.4 banana orange 0.67 grape lemon 0.1 grape banana 0.5：

new_df

订单没有任何区别。

这是一种快速的方法吗？

Answer 1

我创建了一系列组合
然后我对已经存在的组合做同样的事情
我使用np.in1d查找不存在的
然后添加一个新数据框，其中包含尚不存在的数据框。

from itertools import combinations

comb = np.array([set(x) for x in combinations(lst, 2)])
exst = df[['Source', 'Destination']].apply(set, 1).values
new = comb[~np.in1d(comb, exst)]

d1 = pd.DataFrame(
    [list(x) for x in new],
    columns=['Source', 'Destination']
).assign(Weight=0.)

df.append(d1, ignore_index=True)

   Source Destination  Weight
0  orange       apple    0.40
1  banana      orange    0.67
2   grape       lemon    0.10
3   grape      banana    0.50
4   grape      orange    0.00
5  orange       lemon    0.00
6   apple      banana    0.00
7   grape       apple    0.00
8   apple       lemon    0.00
9  banana       lemon    0.00

Answer 2

第1步：将源数据帧转换为冻结集

In [350]: df = df.assign(Combinations=df.apply(lambda x: frozenset(x[:-1]), axis=1)).loc[:, ['Combinations', 'Weight']]

第2步：从import itertools

生成所有可能的组合（lst优先）

In [352]: new_df = pd.DataFrame(list(itertools.combinations(lst, 2)), columns=['Source', 'Destination'])

第3步：合并组合

In [358]: new_df = new_df.iloc[:, :2].apply(lambda x: frozenset(x), axis=1)\
                        .to_frame().rename(columns={0 : "Combinations"})\
                        .merge(df, how='outer').fillna(0)

第4步：恢复原始结构

In [365]: new_df.apply(lambda x: pd.Series(list(x['Combinations'])), axis=1)\
                .rename(columns={0 : 'Source', 1 : 'Destination'})\
                .join(new_df['Weight'])
Out[365]: 
   Source Destination  Weight
0  orange       apple    0.40
1  orange      banana    0.67
2   grape      orange    0.00
3  orange       lemon    0.00
4   apple      banana    0.00
5   grape       apple    0.00
6   apple       lemon    0.00
7   grape      banana    0.50
8   lemon      banana    0.00
9   grape       lemon    0.10

如何有效地填充由列表中的成对值组合组成的不完整的pandas数据帧？

2 个答案: