我有2列DataFrame:
positions = pd.DataFrame({"pos" : [1, 2, 3, 4, 5], "mcap" : [1, 4, 3, 2, 5]}, index = ["a", "b", "c", "d", "e"])
对于每个索引值,我需要找到2D世界中位于右上角的点数,即,对于每一行,我需要计算严格高于当前行的行数。
因此上述示例的答案是:
pd.Series([4, 1, 1, 1, 0], index = ["a", "b", "c", "d", "e"])
我知道如何循环执行此操作,但是一旦DataFrame变大,这将花费大量的计算时间,因此我正在寻找一种更Python的方式来实现。
编辑。简单的循环解决方案。
answer = pd.Series(np.zeros(len(positions)), index = ["a", "b", "c", "d", "e"])
for asset in ["a", "b", "c", "d", "e"]:
better_by_signal = positions[positions["pos"] > positions["pos"].loc[asset]].index
better_by_cap = positions[positions["mcap"] > positions["mcap"].loc[asset]].index
idx_intersection = better_by_signal.intersection(better_by_cap)
answer[asset] = len(idx_intersection)
答案 0 :(得分:1)
您可以使用numpy
广播找到x轴(pos
)和y轴(mcap
)的所有正差对:
import numpy as np
import pandas as pd
positions = pd.DataFrame({"pos" : [1, 2, 3, 4, 5], "mcap" : [1, 4, 3, 2, 5]}, index = ["a", "b", "c", "d", "e"])
arrx = np.asarray([positions.pos])
arry = np.asarray([positions.mcap])
positions["count"] = ((arrx - arrx.T > 0) & (arry - arry.T > 0)).sum(axis = 1)
print(positions)
样本输出
pos mcap count
a 1 1 4
b 2 4 1
c 3 3 1
d 4 2 1
e 5 5 0
答案 1 :(得分:0)
使用map而不是遍历索引,这应该可以工作:-
import pandas as pd
import numpy as np
positions = pd.DataFrame({"pos" : [1, 2, 3, 4, 5], "mcap" : [1, 4, 3, 2, 5]}, index = ["a", "b", "c", "d", "e"])
answer = pd.Series(np.zeros(len(positions)), index = ["a", "b", "c", "d", "e"])
def set_pos(asset):
better_by_signal = positions[positions["pos"] > positions["pos"].loc[asset]].index
better_by_cap = positions[positions["mcap"] > positions["mcap"].loc[asset]].index
idx_intersection = better_by_signal.intersection(better_by_cap)
return len(idx_intersection)
len_intersection = map(set_pos, answer.index.tolist())
final_answer = pd.Series(len_intersection, index = answer.index.tolist())
答案 2 :(得分:0)
您可以像下面这样使用列表理解来代替for循环:
import pandas as pd
import numpy as np
positions = pd.DataFrame({"pos": [1, 2, 3, 4, 5],
"mcap": [1, 4, 3, 2, 5]},
index=["a", "b", "c", "d", "e"])
# gives you a list:
answer = [sum(np.sum((positions - positions.iloc[i] > 0).values, axis=1) ==
2) for i in range(len(positions))]
# convert list to a `pd.Series`:
answer = pd.Series(answer, index=positions.index)
答案 3 :(得分:0)
您可以使用卷积。卷积做了类似的事情(更多信息here):
它将通过矩阵,将滤波器或填充物与矩阵的元素相乘,然后在这种情况下将它们相加。
对于这个问题,让我们首先向数据帧中添加一个新元素f
,以便至少一行包含一个以上的元素。
>> positions
pos mcap
a 1 1
b 2 4
c 3 3
d 4 2
e 5 5
f 3 2
位置也可以看作:
df = pd.crosstab(positions['pos'], positions['mcap'],
values=positions.index, aggfunc=sum)
df
mcap 1 2 3 4 5
pos
1 a NaN NaN NaN NaN
2 NaN NaN NaN b NaN
3 NaN f c NaN NaN
4 NaN d NaN NaN NaN
5 NaN NaN NaN NaN e
df_ones = df.notnull() * 1
mcap 1 2 3 4 5
pos
1 1 0 0 0 0
2 0 0 0 1 0
3 0 1 1 0 0
4 0 1 0 0 0
5 0 0 0 0 1
我们可以创建一个滑过df_ones
的窗口并求和该窗口下的元素总数。这称为“卷积”(或相关)。
现在,让我们创建一个避开左上角元素的窗口(这样就不计算在内),并用我们的df_ones
对其进行卷积以获得结果:
pad = np.ones_like(df.values)
pad[0, 0] = 0
pad
array([[0, 1, 1, 1, 1],
[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1]], dtype=object)
counts = ((signal.correlate(df_ones.values, pad,
mode='full')[-df.shape[0]:,
-df.shape[1]:]) * \
df_ones).unstack().replace(0, np.nan).dropna(
).reset_index().rename(columns={0: 'count'})
mcap pos count
0 1 1 5.0
1 2 3 3.0
2 2 4 1.0
3 3 3 1.0
4 4 2 1.0
positions.reset_index().merge(counts,
how='left').fillna(0
).sort_values('pos').set_index('index')
pos mcap count
index
a 1 1 5.0
b 2 4 1.0
c 3 3 1.0
f 3 2 3.0
d 4 2 1.0
e 5 5 0.0
所有功能:
def count_upper(df):
df = pd.crosstab(positions['pos'], positions['mcap'],
values=positions.index, aggfunc=sum)
df_ones = df.notnull() * 1
pad = np.ones_like(df.values)
pad[0, 0] = 0
counts = ((signal.correlate(df_ones.values, pad,
mode='full')[-df.shape[0]:,
-df.shape[1]:]) * df_ones)
counts = counts.unstack().replace(0, np.nan).dropna(
).reset_index().rename(columns={0: 'count'})
result = positions.reset_index().merge(counts,
how='left')
result = result.fillna(0).sort_values('pos').set_index('index')
return result
对于您的示例,结果将符合您的预期结果:
positions = pd.DataFrame({"pos" : [1, 2, 3, 4, 5],
"mcap" : [1, 4, 3, 2, 5]},
index = ["a", "b", "c", "d", "e"])
>> count_upper(positions)
pos mcap count
index
a 1 1 4.0
b 2 4 1.0
c 3 3 1.0
d 4 2 1.0
e 5 5 0.0