取出R中最大日期的行

时间:2013-01-06 19:45:07

标签: r

我在R中有一个数据框。假设它是股票价格。

[1] "Date"      "Open"      "High"      "Low"       "Close"     "Volume"   
[7] "Adj.Close"
10   2012-12-20 54.53 54.61 53.70 54.21   4898900     54.21
9    2012-12-21 53.05 53.69 52.59 53.60  11076800     53.60
8    2012-12-24 53.37 54.00 53.33 53.69   1702900     53.69
7    2012-12-26 53.62 53.79 52.88 53.13   3047100     53.13
6    2012-12-27 53.09 53.64 52.71 53.24   4583600     53.24
5    2012-12-28 52.98 53.27 52.62 52.64   3395700     52.64
4    2012-12-31 52.41 53.67 52.39 53.63   4623500     53.63
3    2013-01-02 54.59 55.00 54.26 55.00   6633800     55.00
2    2013-01-03 55.07 55.61 55.00 55.37   7335200     55.37
1    2013-01-04 55.53 56.00 55.31 55.69   5455700     55.69

像上面这样的东西。现在我需要找出每年最后一天的行。我怎么能这样做?

4 个答案:

答案 0 :(得分:2)

您可以从日期(例如年份和月份)中提取“分组变量”,然后对不同的值使用聚合函数。那将是手工完成的。

或者你可以使用已有运算符的xts包:

R> library(quantmod)                             ## for getSymbols()
R> SPY <- getSymbols("SPY", auto.assign=FALSE)   ## SPY is now of class xts

我们可以查看数据

R> summary(SPY)
     Index               SPY.Open      SPY.High      SPY.Low     
 Min.   :2007-01-03   Min.   : 68   Min.   : 70   Min.   : 67.1  
 1st Qu.:2008-07-03   1st Qu.:111   1st Qu.:112   1st Qu.:110.0  
 Median :2010-01-04   Median :128   Median :129   Median :127.5  
 Mean   :2010-01-02   Mean   :124   Mean   :125   Mean   :123.0  
 3rd Qu.:2011-07-05   3rd Qu.:140   3rd Qu.:140   3rd Qu.:139.0  
 Max.   :2013-01-04   Max.   :157   Max.   :158   Max.   :155.4  
   SPY.Close       SPY.Volume        SPY.Adjusted  
 Min.   : 68.1   Min.   :3.87e+07   Min.   : 62.6  
 1st Qu.:110.8   1st Qu.:1.38e+08   1st Qu.:104.1  
 Median :128.4   Median :1.86e+08   Median :121.1  
 Mean   :124.0   Mean   :2.12e+08   Mean   :116.1  
 3rd Qu.:139.7   3rd Qu.:2.57e+08   3rd Qu.:130.0  
 Max.   :156.5   Max.   :8.71e+08   Max.   :146.4  

R> 

运行我们想要的计算:

R> tail(SPY[ endpoints(SPY) ])
           SPY.Open SPY.High SPY.Low SPY.Close SPY.Volume
2012-08-31   141.29   141.82  140.36    141.16  151970400
2012-09-28   144.09   144.56  143.46    143.97  150696100
2012-10-31   141.85   142.03  140.68    141.35  103438500
2012-11-30   142.14   142.42  141.66    142.15  136568300
2012-12-31   139.66   142.56  139.54    142.41  243935200
2013-01-04   145.97   146.61  145.67    146.37  116790800
           SPY.Adjusted
2012-08-31       139.42
2012-09-28       142.96
2012-10-31       140.35
2012-11-30       141.15
2012-12-31       142.41
2013-01-04       146.37

此处endpoints()是您想要的功能,默认选择月份。它找到了我们想要的行索引。所以这里有多年了:

R> SPY[ endpoints(SPY, "years") ]
           SPY.Open SPY.High SPY.Low SPY.Close SPY.Volume
2007-12-31   147.10   147.61  146.06    146.21  108126800
2008-12-31    89.08    90.97   88.87     90.24  193987200
2009-12-31   112.77   112.80  111.39    111.44   90637900
2010-12-31   125.53   125.87  125.33    125.75   91218900
2011-12-30   126.02   126.33  125.50    125.50   95599000
2012-12-31   139.66   142.56  139.54    142.41  243935200
2013-01-04   145.97   146.61  145.67    146.37  116790800
           SPY.Adjusted
2007-12-31       131.14
2008-12-31        82.88
2009-12-31       104.73
2010-12-31       120.49
2011-12-30       122.78
2012-12-31       142.41
2013-01-04       146.37
R> 

答案 1 :(得分:0)

您还可以使用基本R包提取信息:

#Get the years from the dataset
years=unique(format(dataset$Date, "%Y"))
#Get the last day values for each year
values=list()
for (y in 1:length(years)){
    values[[y]]=dataset[dataset$Date==max(dataset$Date[format(dataset$Date, "%Y")==years[y]]),]
}

答案 2 :(得分:0)

基础解决方案:

获取一些测试数据:

test <- read.table(textConnection("Date      Open      High      Low  Close Volume Adj.Close
2012-12-28 52.98 53.27 52.62 52.64   3395700     52.64
2012-12-31 52.41 53.67 52.39 53.63   4623500     53.63
2013-01-03 55.07 55.61 55.00 55.37   7335200     55.37
2013-01-04 55.53 56.00 55.31 55.69   5455700     55.69"),header=TRUE)

将日期列更改为实际日期:

test$Date <- as.Date(test$Date)

获取与每年内最长日期对应的行:

do.call(
         rbind,
         by(test,format(test$Date,"%Y"),function(x) x[x$Date == max(x$Date),])
       )

           Date  Open  High   Low Close  Volume Adj.Close
2012 2012-12-31 52.41 53.67 52.39 53.63 4623500     53.63
2013 2013-01-04 55.53 56.00 55.31 55.69 5455700     55.69

答案 3 :(得分:0)

使用来自@thelatemail的“测试”数据集,这是另一个 - 不是一个,而是两个基础的R方法:

  1. ave() + cut.Date() +基本子集:

    test[test$Date == ave(test$Date, cut(test$Date, "1 year"), FUN = max), ]
    #         Date  Open  High   Low Close  Volume Adj.Close
    # 2 2012-12-31 52.41 53.67 52.39 53.63 4623500     53.63
    # 4 2013-01-04 55.53 56.00 55.31 55.69 5455700     55.69
    
  2. sapply() + split() + cut.Date()。我不太喜欢你必须转置输出。我猜您也可以lapply()代替sapply(),然后使用do.call(rbind...)获取data.frame

    t(sapply(split(test, cut(test$Date, "1 year")), 
             function(x) x[which.max(x[["Date"]]),]))
    #            Date  Open  High  Low   Close Volume  Adj.Close
    # 2012-01-01 15705 52.41 53.67 52.39 53.63 4623500 53.63    
    # 2013-01-01 15709 55.53 56    55.31 55.69 5455700 55.69