Dataframe中2个对象之间的最小差异

时间:2016-04-17 13:43:52

标签: python algorithm pandas

{{1}}

我正在寻找一种算法来根据PE的差异对池中的股票对进行排名,即PE股票1 - PE股票2

即40只股票的库存,基于基于最小PE差异的独特股票对进行排名。 Total将拥有20个独特的对 例如。 MSFT出现在第1对中,最小的PE与MSFT对相关联,MSFT不应再次出现在后续对中

这样做的正确算法是什么?

到目前为止,我试图找到每对的PE差异并按升序排名。接下来我该怎么办?

2 个答案:

答案 0 :(得分:1)

这是一种使用itertools.combinations()isin()drop()的方法:

import pandas as pd
import itertools as it

df = pd.DataFrame({'Stock' : ['Apple', 'Broadcomm', 'Citi', 'D&G', 'Elixir', 'Foxtrot'],
                   'PE'    : [3.8, 3.9, 5.6, 6.8, 0.5, 3.9]})
print(df)

assert len(df) % 2 == 0
m = df.set_index('Stock')
ranking = pd.DataFrame(columns=['StockA', 'StockB', 'minPE', 'deltaPE'],
                       data=[(a, b, min(m.PE[a], m.PE[b]), abs(m.PE[a] - m.PE[b]))
                             for a, b in it.combinations(m.index, 2)])
ranking.sort_values(['deltaPE', 'minPE'], inplace=True)
print(ranking)

# ranking is sorted from best to worst.
# Start with first line, eliminate other lines that belong to either one of
# this line's stocks (but not both), then proceed to next line and repeat.
for i in range(len(df) // 2):
    a = ranking.iloc[i].StockA
    b = ranking.iloc[i].StockB
    contenders = ranking[ranking.StockA.isin([a, b]) ^ ranking.StockB.isin([a, b])]
    ranking.drop(contenders.index, inplace=True)

print(ranking)

输出:

    PE      Stock
0  3.8      Apple
1  3.9  Broadcomm
2  5.6       Citi
3  6.8        D&G
4  0.5     Elixir
5  3.9    Foxtrot

# ---- Ranking after sorting:
       StockA     StockB  minPE  deltaPE
8   Broadcomm    Foxtrot    3.9      0.0
0       Apple  Broadcomm    3.8      0.1
4       Apple    Foxtrot    3.8      0.1
9        Citi        D&G    5.6      1.2
5   Broadcomm       Citi    3.9      1.7
11       Citi    Foxtrot    3.9      1.7
1       Apple       Citi    3.8      1.8
6   Broadcomm        D&G    3.9      2.9
13        D&G    Foxtrot    3.9      2.9
2       Apple        D&G    3.8      3.0
3       Apple     Elixir    0.5      3.3
7   Broadcomm     Elixir    0.5      3.4
14     Elixir    Foxtrot    0.5      3.4
10       Citi     Elixir    0.5      5.1
12        D&G     Elixir    0.5      6.3

# ---- Ranking after dropping rows:
      StockA   StockB  minPE  deltaPE
8  Broadcomm  Foxtrot    3.9      0.0
9       Citi      D&G    5.6      1.2
3      Apple   Elixir    0.5      3.3

答案 1 :(得分:1)

大熊猫基础解决方案:

首先进行比赛:

df = pd.DataFrame( {'Stock'  : ['Apple','Broadcomm','Citi','D&G','Samsung','Elite'],
                  'PE' : pd.Series([1.5,3.9,5.6,6.8,6,6])})
df.set_index('Stock',inplace=True)
df.sort_values('PE',inplace=True)               
crosstable=pd.DataFrame(add.outer(df.PE,-df.PE),df.index,df.index)
v=crosstable.mask(triu(ones((len(df),len(df)),bool))) #keep valid comparisons

然后v是:

Stock      Apple  Broadcomm  Citi  Samsung  Elite  D&G
Stock                                                 
Apple        NaN        NaN   NaN      NaN    NaN  NaN
Broadcomm    2.4        NaN   NaN      NaN    NaN  NaN
Citi         4.1        1.7   NaN      NaN    NaN  NaN
Samsung      4.5        2.1   0.4      NaN    NaN  NaN
Elite        4.5        2.1   0.4      0.0    NaN  NaN
D&G          5.3        2.9   1.2      0.8    0.8  NaN

然后是classement:

w=v.stack()
w.sort_values(inplace=True)

w是:

Stock      Stock    
Elite      Samsung      0.0
Samsung    Citi         0.4
Elite      Citi         0.4
D&G        Samsung      0.8
           Elite        0.8
           Citi         1.2
Citi       Broadcomm    1.7
Samsung    Broadcomm    2.1
Elite      Broadcomm    2.1
Broadcomm  Apple        2.4
D&G        Broadcomm    2.9
Citi       Apple        4.1
Samsung    Apple        4.5
Elite      Apple        4.5
D&G        Apple        5.3

并提取最佳对:

i=0
s=set(df.index)
top=[]
while s :
    x,y = w.index[i]
    if x in s and y in s :
        top += (x,y),
        s -= {x,y}
    i+=1
结果是

w[top]

Stock      Stock  
Elite      Samsung    0.0
D&G        Citi       1.2
Broadcomm  Apple      2.4