大熊猫成对算术,类似于rolling()。corr()

时间:2019-11-29 14:51:04

标签: python pandas dataframe rolling-computation pairwise

我有一个数据框,如下所示:

fsym                            EOS       BTC       BNB
time                                                   
2018-11-30 00:00:00+00:00 -0.051903 -0.069088 -0.058162
2018-12-01 00:00:00+00:00  0.026936  0.044739  0.040303
2018-12-02 00:00:00+00:00 -0.034843 -0.012935 -0.005900
2018-12-03 00:00:00+00:00 -0.108108 -0.070375 -0.028180
2018-12-04 00:00:00+00:00 -0.048583  0.019509  0.131986

我可以简单地通过以下方式计算列成对相关性:

pt = pt.rolling(3).corr()

产生:

sym                                 EOS       BTC       BNB
time                      fsym                              
2018-11-30 00:00:00+00:00 EOS        NaN       NaN       NaN
                          BTC        NaN       NaN       NaN
                          BNB        NaN       NaN       NaN
2018-12-01 00:00:00+00:00 EOS        NaN       NaN       NaN
                          BTC        NaN       NaN       NaN
                          BNB        NaN       NaN       NaN
2018-12-02 00:00:00+00:00 EOS   1.000000  0.952709  0.938688
                          BTC   0.952709  1.000000  0.999066
                          BNB   0.938688  0.999066  1.000000
2018-12-03 00:00:00+00:00 EOS   1.000000  0.998738  0.969385
                          BTC   0.998738  1.000000  0.980492
                          BNB   0.969385  0.980492  1.000000
...

如何类似地计算数据帧的成对差异?我想这相当于使用滚动窗口1。

编辑:正如评论中所指出的那样,上面的示例实际上并不是我没有注意到的按列相关。

2 个答案:

答案 0 :(得分:1)

如果要9列:

@Test  
public void CalendarTest()      
{  
   WebDriver driver= new FirefoxDriver();   
          
   // Creating JavascriptExecutor interface object Type casting  
   JavascriptExecutor js = (JavascriptExecutor)driver;  
          
   // Launching the Site.              driver.get("http://keenthemes.com/preview/metronic/theme/admin_1/components_date_time_pickers.html");   
          
  WebElement datePicker = driver.findElement(By.xpath("(//input[@class='form-control'])[11]"));          
          
  // Need to remove readonly HTML attribute  
  js.executeScript("document.getElementsByClassName('form-control')[11].removeAttribute('readonly');", datePicker);
          
  // Enter Date directly into the field
  driver.findElement(By.xpath("(//input[@class='form-control'])[11]")).sendKeys("03-05-2019");      
  }

输出:

# test data
df = pd.DataFrame(np.arange(12).reshape(-1,3), columns=list('abc'))

s = df.values
new_cols = pd.MultiIndex.from_product([df.columns, df.columns])

pd.DataFrame((s[:,None,:] - s[:, :,  None]).reshape(len(df), -1),
             index=df.index,
             columns=new_cols)

答案 1 :(得分:0)

以下功能接近解决方案:

def columnwise_difference(df):          
    a = df.values
    r,c = pd.np.triu_indices(a.shape[1], 1)
    cols = df.columns
    nm = [cols[i]+"-"+cols[j] for i,j in zip(r,c)]
    return pd.DataFrame(a[:,r] - a[:,c], columns=nm, index=df.index)

给出:

                            EOS-BTC   EOS-BNB   BTC-BNB
time                                                   
2018-11-30 00:00:00+00:00  0.017185  0.006259 -0.010926
2018-12-01 00:00:00+00:00 -0.017803 -0.013367  0.004436
2018-12-02 00:00:00+00:00 -0.021908 -0.028943 -0.007035
2018-12-03 00:00:00+00:00 -0.037733 -0.079928 -0.042195

...除了我不只是想要np.triu_indices之外,还包括包括EOS-EOS等在内的所有9种组合(必须对此做一个简单的更改)