R sort()data.frame

时间:2011-04-09 02:35:37

标签: r sorting object dataframe

我有以下数据框

head(stockdatareturnpercent)
                  SPY         DIA        IWM        SMH        OIH        
2001-04-02  8.1985485   7.8349806   7.935566  21.223832  13.975655  
2001-05-01 -0.5621328   1.7198760   2.141846 -10.904936  -4.565291  
2001-06-01 -2.6957979  -3.5838102   2.786250   4.671762 -23.241009 
2001-07-02 -1.0248091  -0.1997433  -5.725078  -3.354391  -9.161594  
2001-08-01 -6.1165559  -5.0276558  -2.461728  -6.218129 -13.956695  
2001-09-04 -8.8900629 -12.2663267 -15.760037 -39.321172 -16.902913 

实际上有更多的股票,但为了说明的目的,我不得不削减它。在每个月我都想知道最好的(或最差的)表演者。我玩了sort()函数,这就是我想出来的。

N <- dim(stockdatareturnpercent)[1]  
for (i in 1:N) {  
    s <- sort(stockdatareturnpercent[i,])  
    print(s)  
}  

                 UPS     FDX      XLP      XLU      XLV     DIA      IWM      SPY      XLE      XLB      XLI      OIH      XLK      SMH     MSFT
2001-04-02 0.6481585 0.93135 1.923136 4.712996 7.122751 7.83498 7.935566 8.198549 9.826701 10.13465 10.82522 13.97566 14.98789 21.22383 21.41436
                 SMH       FDX       OIH       XLK        XLE        SPY       XLU      XLP      DIA     MSFT      IWM     UPS      XLV      XLB      XLI
2001-05-01 -10.90494 -5.045544 -4.565291 -4.182041 -0.9492803 -0.5621328 0.6987724 1.457579 1.719876 2.088734 2.141846 3.73587 3.748309 3.774033 4.099748
                 OIH       XLE       XLI     XLU     XLP       XLB      DIA       UPS       SPY       XLV       FDX      XLK     IWM      SMH     MSFT
2001-06-01 -23.24101 -10.02403 -6.594324 -5.8602 -5.0532 -3.955192 -3.58381 -2.814685 -2.695798 -1.177474 0.4987542 1.935544 2.78625 4.671762 5.374764
                MSFT       OIH      XLK       IWM       SMH       XLV       UPS       XLE       SPY        XLU        XLB        XLI        DIA      FDX
2001-07-02 -9.793005 -9.161594 -7.17351 -5.725078 -3.354391 -2.016818 -1.692442 -1.159914 -1.024809 -0.9029407 -0.2723560 -0.2078283 -0.1997433 2.868898
                XLP
2001-07-02 2.998604

这是一种非常低效且廉价的方式来查看结果。创建一个存储此数据的对象会很好。但是,如果我在R提示符中输入's',我只得到最后一行的值,因为for循环的每个后续迭代都会替换先前的数据。

我非常感谢一些指导。谢谢你。

2 个答案:

答案 0 :(得分:2)

使用order()sort()在使用*apply时删除名称:

id <- t(apply(Data,1,order))
lapply(1:nrow(id),function(i)Data[i,id[i,]])

在id矩阵中使用order的结果也可以让你做到:

matrix(names(Data)[id],ncol=ncol(Data))
     [,1]  [,2]  [,3]  [,4]  [,5] 
[1,] "DIA" "IWM" "SPY" "OIH" "SMH"
[2,] "SMH" "OIH" "SPY" "DIA" "IWM"
[3,] "OIH" "DIA" "SPY" "IWM" "SMH"
[4,] "OIH" "IWM" "SMH" "SPY" "DIA"
[5,] "OIH" "SMH" "SPY" "DIA" "IWM"
[6,] "SMH" "OIH" "IWM" "DIA" "SPY"

要找出在特定时刻最好的那些。

如果您想使用循环,可以使用列表。正如约书亚所说,你在每一个循环中都覆盖了s。初始化列表以首先存储结果。此循环使用lapply()提供与上述代码相同的结果,但没有id矩阵。虽然使用apply有其他好处,但速度并没有增加:

N <- nrow(Data)
s <- vector("list",N)
for (i in 1:N) {
    s[[i]] <- sort(Data[i,])
}

我使用以下示例数据测试了代码(请在将来提供您自己的代码,使用此示例或例如dput()):

zz <- textConnection(" SPY         DIA        IWM        SMH        OIH
  8.1985485   7.8349806   7.935566  21.223832  13.975655
 -0.5621328   1.7198760   2.141846 -10.904936  -4.565291
 -2.6957979  -3.5838102   2.786250   4.671762 -23.241009
 -1.0248091  -0.1997433  -5.725078  -3.354391  -9.161594
 -6.1165559  -5.0276558  -2.461728  -6.218129 -13.956695
 -8.8900629 -12.2663267 -15.760037 -39.321172 -16.902913 ")

Data <- read.table(zz,header=T)
close(zz)

答案 1 :(得分:0)

使用原始代码将每个已排序的行保存在list

stockdatareturnpercent <- read.table(textConnection("                  SPY         DIA        IWM        SMH        OIH        
2001-04-02  8.1985485   7.8349806   7.935566  21.223832  13.975655  
2001-05-01 -0.5621328   1.7198760   2.141846 -10.904936  -4.565291  
2001-06-01 -2.6957979  -3.5838102   2.786250   4.671762 -23.241009 
2001-07-02 -1.0248091  -0.1997433  -5.725078  -3.354391  -9.161594  
2001-08-01 -6.1165559  -5.0276558  -2.461728  -6.218129 -13.956695  
2001-09-04 -8.8900629 -12.2663267 -15.760037 -39.321172 -16.902913"))

x <- vector("list", nrow(stockdatareturnpercent))

## use unlist to drop the data.frame structure
for (i in 1:nrow(stockdatareturnpercent)) {  
    x[[i]] <- sort(unlist(stockdatareturnpercent[i,])  )
} 
## use the row names to name each list element
names(x) <- rownames(stockdatareturnpercent)

x
$`2001-04-02`
  DIA       IWM       SPY       OIH       SMH 
7.834981  7.935566  8.198548 13.975655 21.223832 

$`2001-05-01`
    SMH         OIH         SPY         DIA         IWM 
-10.9049360  -4.5652910  -0.5621328   1.7198760   2.1418460 

$`2001-06-01`
   OIH        DIA        SPY        IWM        SMH 
-23.241009  -3.583810  -2.695798   2.786250   4.671762 

$`2001-07-02`
   OIH        IWM        SMH        SPY        DIA 
-9.1615940 -5.7250780 -3.3543910 -1.0248091 -0.1997433 

$`2001-08-01`
   OIH        SMH        SPY        DIA        IWM 
-13.956695  -6.218129  -6.116556  -5.027656  -2.461728 

$`2001-09-04`
   SMH        OIH        IWM        DIA        SPY 
-39.321172 -16.902913 -15.760037 -12.266327  -8.890063 

直接使用apply对每一行进行排序,但不保留元素名称:

apply(stockdatareturnpercent, 1, sort)

返回一个矩阵,其中每列是排序行。转置:

sortmat <- t(apply(stockdatareturnpercent, 1, sort))

如果您需要将结果作为data.frame,as.data.frame it:

sortdf <- as.data.frame(sortmat)

最后,所有这一行

sortdf <- as.data.frame(t(apply(stockdatareturnpercent, 1, sort)))