{{1}}
我正在寻找一种算法来根据PE的差异对池中的股票对进行排名,即PE股票1 - PE股票2
即40只股票的库存,基于基于最小PE差异的独特股票对进行排名。 Total将拥有20个独特的对 例如。 MSFT出现在第1对中,最小的PE与MSFT对相关联,MSFT不应再次出现在后续对中
这样做的正确算法是什么?
到目前为止,我试图找到每对的PE差异并按升序排名。接下来我该怎么办?
答案 0 :(得分:1)
这是一种使用itertools.combinations()
,isin()
和drop()
的方法:
import pandas as pd
import itertools as it
df = pd.DataFrame({'Stock' : ['Apple', 'Broadcomm', 'Citi', 'D&G', 'Elixir', 'Foxtrot'],
'PE' : [3.8, 3.9, 5.6, 6.8, 0.5, 3.9]})
print(df)
assert len(df) % 2 == 0
m = df.set_index('Stock')
ranking = pd.DataFrame(columns=['StockA', 'StockB', 'minPE', 'deltaPE'],
data=[(a, b, min(m.PE[a], m.PE[b]), abs(m.PE[a] - m.PE[b]))
for a, b in it.combinations(m.index, 2)])
ranking.sort_values(['deltaPE', 'minPE'], inplace=True)
print(ranking)
# ranking is sorted from best to worst.
# Start with first line, eliminate other lines that belong to either one of
# this line's stocks (but not both), then proceed to next line and repeat.
for i in range(len(df) // 2):
a = ranking.iloc[i].StockA
b = ranking.iloc[i].StockB
contenders = ranking[ranking.StockA.isin([a, b]) ^ ranking.StockB.isin([a, b])]
ranking.drop(contenders.index, inplace=True)
print(ranking)
输出:
PE Stock
0 3.8 Apple
1 3.9 Broadcomm
2 5.6 Citi
3 6.8 D&G
4 0.5 Elixir
5 3.9 Foxtrot
# ---- Ranking after sorting:
StockA StockB minPE deltaPE
8 Broadcomm Foxtrot 3.9 0.0
0 Apple Broadcomm 3.8 0.1
4 Apple Foxtrot 3.8 0.1
9 Citi D&G 5.6 1.2
5 Broadcomm Citi 3.9 1.7
11 Citi Foxtrot 3.9 1.7
1 Apple Citi 3.8 1.8
6 Broadcomm D&G 3.9 2.9
13 D&G Foxtrot 3.9 2.9
2 Apple D&G 3.8 3.0
3 Apple Elixir 0.5 3.3
7 Broadcomm Elixir 0.5 3.4
14 Elixir Foxtrot 0.5 3.4
10 Citi Elixir 0.5 5.1
12 D&G Elixir 0.5 6.3
# ---- Ranking after dropping rows:
StockA StockB minPE deltaPE
8 Broadcomm Foxtrot 3.9 0.0
9 Citi D&G 5.6 1.2
3 Apple Elixir 0.5 3.3
答案 1 :(得分:1)
大熊猫基础解决方案:
首先进行比赛:
df = pd.DataFrame( {'Stock' : ['Apple','Broadcomm','Citi','D&G','Samsung','Elite'],
'PE' : pd.Series([1.5,3.9,5.6,6.8,6,6])})
df.set_index('Stock',inplace=True)
df.sort_values('PE',inplace=True)
crosstable=pd.DataFrame(add.outer(df.PE,-df.PE),df.index,df.index)
v=crosstable.mask(triu(ones((len(df),len(df)),bool))) #keep valid comparisons
然后v
是:
Stock Apple Broadcomm Citi Samsung Elite D&G
Stock
Apple NaN NaN NaN NaN NaN NaN
Broadcomm 2.4 NaN NaN NaN NaN NaN
Citi 4.1 1.7 NaN NaN NaN NaN
Samsung 4.5 2.1 0.4 NaN NaN NaN
Elite 4.5 2.1 0.4 0.0 NaN NaN
D&G 5.3 2.9 1.2 0.8 0.8 NaN
然后是classement:
w=v.stack()
w.sort_values(inplace=True)
w
是:
Stock Stock
Elite Samsung 0.0
Samsung Citi 0.4
Elite Citi 0.4
D&G Samsung 0.8
Elite 0.8
Citi 1.2
Citi Broadcomm 1.7
Samsung Broadcomm 2.1
Elite Broadcomm 2.1
Broadcomm Apple 2.4
D&G Broadcomm 2.9
Citi Apple 4.1
Samsung Apple 4.5
Elite Apple 4.5
D&G Apple 5.3
并提取最佳对:
i=0
s=set(df.index)
top=[]
while s :
x,y = w.index[i]
if x in s and y in s :
top += (x,y),
s -= {x,y}
i+=1
结果是 w[top]
:
Stock Stock
Elite Samsung 0.0
D&G Citi 1.2
Broadcomm Apple 2.4