我有一个熊猫数据框,看起来像这样:
colour points
0 red 1
1 yellow 10
2 black -3
然后我正在尝试执行以下算法:
combos = []
points = []
for i1 in range(len(df)):
for i2 in range(len(df)):
colour_main = df['colour'].values[i1]
colour_secondary = df['colour'].values[i2]
combo = colour_main + "_" + colour_secondary
point1 = df['points'].values[i1]
point2 = df['points'].values[i2]
new_points = point1 + point2
combos.append(combo)
points.append(new_points)
df_new = pd.DataFrame({'colours': combos,
'points': points})
print(df_new)
我想获得所有组合和总和:
示例:
red_yellow = 1 + (-10) = -9
red_black = 1 + ( +3) = 4
black_red = -3 + ( -1) = -4
我当前得到的输出:
colours points
0 red_red 2
1 red_yellow 11
2 red_black -2
3 yellow_red 11
4 yellow_yellow 20
5 yellow_black 7
6 black_red -2
7 black_yellow 7
8 blac_kblack -6
我正在寻找的输出:
red_yellow -9
red_black 4
yellow_red 9
yellow_black 13
black_red -4
black_yellow -13
我不知道如何将我的逻辑应用于此代码,我敢打赌,有一种更简单的方法无需进行两个循环即可获取所有组合,但是目前,这是我唯一想到的事情。
我想:
答案 0 :(得分:4)
这是一些替代方案的timeit
比较。
| method | ms per loop |
|--------------------+-------------|
| alt2 | 2.36 |
| using_concat | 3.26 |
| using_double_merge | 22.4 |
| orig | 22.6 |
| alt | 45.8 |
timeit
结果是使用IPython
生成的:
In [138]: df = make_df(20)
In [143]: %timeit alt2(df)
100 loops, best of 3: 2.36 ms per loop
In [140]: %timeit orig(df)
10 loops, best of 3: 22.6 ms per loop
In [142]: %timeit alt(df)
10 loops, best of 3: 45.8 ms per loop
In [169]: %timeit using_double_merge(df)
10 loops, best of 3: 22.4 ms per loop
In [170]: %timeit using_concat(df)
100 loops, best of 3: 3.26 ms per loop
import numpy as np
import pandas as pd
def alt(df):
df['const'] = 1
result = pd.merge(df, df, on='const', how='outer')
result = result.loc[(result['colour_x'] != result['colour_y'])]
result['color'] = result['colour_x'] + '_' + result['colour_y']
result['points'] = result['points_x'] - result['points_y']
result = result[['color', 'points']]
return result
def alt2(df):
points = np.add.outer(df['points'], -df['points'])
color = pd.MultiIndex.from_product([df['colour'], df['colour']])
mask = color.labels[0] != color.labels[1]
color = color.map('_'.join)
result = pd.DataFrame({'points':points.ravel(), 'color':color})
result = result.loc[mask]
return result
def orig(df):
combos = []
points = []
for i1 in range(len(df)):
for i2 in range(len(df)):
colour_main = df['colour'].iloc[i1]
colour_secondary = df['colour'].iloc[i2]
if colour_main != colour_secondary:
combo = colour_main + "_" + colour_secondary
point1 = df['points'].values[i1]
point2 = df['points'].values[i2]
new_points = point1 - point2
combos.append(combo)
points.append(new_points)
return pd.DataFrame({'color':combos, 'points':points})
def using_concat(df):
"""https://stackoverflow.com/a/51641085/190597 (RafaelC)"""
d = df.set_index('colour').to_dict()['points']
s = pd.Series(list(itertools.combinations(df.colour, 2)))
s = pd.concat([s, s.transform(lambda k: k[::-1])])
v = s.map(lambda k: d[k[0]] - d[k[1]])
df2 = pd.DataFrame({'comb': s.str.get(0)+'_' + s.str.get(1), 'values': v})
return df2
def using_double_merge(df):
"""https://stackoverflow.com/a/51641007/190597 (sacul)"""
new = (df.reindex(pd.MultiIndex.from_product([df.colour, df.colour]))
.reset_index()
.drop(['colour', 'points'], 1)
.merge(df.set_index('colour'), left_on='level_0', right_index=True)
.merge(df.set_index('colour'), left_on='level_1', right_index=True))
new['points_y'] *= -1
new['sum'] = new.sum(axis=1)
new = new[new.level_0 != new.level_1].drop(['points_x', 'points_y'], 1)
new['colours'] = new[['level_0', 'level_1']].apply(lambda x: '_'.join(x),1)
return new[['colours', 'sum']]
def make_df(N):
df = pd.DataFrame({'colour': np.arange(N),
'points': np.random.randint(10, size=N)})
df['colour'] = df['colour'].astype(str)
return df
alt2
中的主要思想是使用np.add_outer
来构造一个加法表
在df['points']
中:
In [149]: points = np.add.outer(df['points'], -df['points'])
In [151]: points
Out[151]:
array([[ 0, -9, 4],
[ 9, 0, 13],
[ -4, -13, 0]])
ravel
用于使数组成为一维数组:
In [152]: points.ravel()
Out[152]: array([ 0, -9, 4, 9, 0, 13, -4, -13, 0])
和颜色组合是通过pd.MultiIndex.from_product
生成的:
In [153]: color = pd.MultiIndex.from_product([df['colour'], df['colour']])
In [155]: color = color.map('_'.join)
In [156]: color
Out[156]:
Index(['red_red', 'red_yellow', 'red_black', 'yellow_red', 'yellow_yellow',
'yellow_black', 'black_red', 'black_yellow', 'black_black'],
dtype='object')
将生成一个掩码以删除重复项:
mask = color.labels[0] != color.labels[1]
然后从这些部分生成result
:
result = pd.DataFrame({'points':points.ravel(), 'color':color})
result = result.loc[mask]
original answer, here中解释了alt
背后的想法。
答案 1 :(得分:2)
这有点麻烦,但是可以为您提供所需的输出:
new = (df.reindex(pd.MultiIndex.from_product([df.colour, df.colour]))
.reset_index()
.drop(['colour', 'points'], 1)
.merge(df.set_index('colour'), left_on='level_0', right_index=True)
.merge(df.set_index('colour'), left_on='level_1', right_index=True))
new['points_x'] *= -1
new['sum'] = new.sum(axis=1)
new = new[new.level_0 != new.level_1].drop(['points_x', 'points_y'], 1)
new['colours'] = new[['level_0', 'level_1']].apply(lambda x: '_'.join(x),1)
>>> new
level_0 level_1 sum colours
3 yellow red -9 yellow_red
6 black red 4 black_red
1 red yellow 9 red_yellow
7 black yellow 13 black_yellow
2 red black -4 red_black
5 yellow black -13 yellow_black
答案 2 :(得分:2)
d = df.set_index('colour').to_dict()['points']
s = pd.Series(list(itertools.combinations(df.colour, 2)))
s = pd.concat([s, s.transform(lambda k: k[::-1])])
v = s.map(lambda k: d[k[0]] - d[k[1]])
df2= pd.DataFrame({'comb': s.str.get(0)+'_' + s.str.get(1), 'values': v})
comb values
0 red_yellow -9
1 red_black 4
2 yellow_black 13
0 yellow_red 9
1 black_red -4
2 black_yellow -13
答案 3 :(得分:1)
您必须在代码中更改此行
new_points = point1 + point2
对此
new_points = point1 - point2