我在R中有一个数据框。假设它是股票价格。
[1] "Date" "Open" "High" "Low" "Close" "Volume"
[7] "Adj.Close"
10 2012-12-20 54.53 54.61 53.70 54.21 4898900 54.21
9 2012-12-21 53.05 53.69 52.59 53.60 11076800 53.60
8 2012-12-24 53.37 54.00 53.33 53.69 1702900 53.69
7 2012-12-26 53.62 53.79 52.88 53.13 3047100 53.13
6 2012-12-27 53.09 53.64 52.71 53.24 4583600 53.24
5 2012-12-28 52.98 53.27 52.62 52.64 3395700 52.64
4 2012-12-31 52.41 53.67 52.39 53.63 4623500 53.63
3 2013-01-02 54.59 55.00 54.26 55.00 6633800 55.00
2 2013-01-03 55.07 55.61 55.00 55.37 7335200 55.37
1 2013-01-04 55.53 56.00 55.31 55.69 5455700 55.69
像上面这样的东西。现在我需要找出每年最后一天的行。我怎么能这样做?
答案 0 :(得分:2)
您可以从日期(例如年份和月份)中提取“分组变量”,然后对不同的值使用聚合函数。那将是手工完成的。
或者你可以使用已有运算符的xts包:
R> library(quantmod) ## for getSymbols()
R> SPY <- getSymbols("SPY", auto.assign=FALSE) ## SPY is now of class xts
我们可以查看数据
R> summary(SPY)
Index SPY.Open SPY.High SPY.Low
Min. :2007-01-03 Min. : 68 Min. : 70 Min. : 67.1
1st Qu.:2008-07-03 1st Qu.:111 1st Qu.:112 1st Qu.:110.0
Median :2010-01-04 Median :128 Median :129 Median :127.5
Mean :2010-01-02 Mean :124 Mean :125 Mean :123.0
3rd Qu.:2011-07-05 3rd Qu.:140 3rd Qu.:140 3rd Qu.:139.0
Max. :2013-01-04 Max. :157 Max. :158 Max. :155.4
SPY.Close SPY.Volume SPY.Adjusted
Min. : 68.1 Min. :3.87e+07 Min. : 62.6
1st Qu.:110.8 1st Qu.:1.38e+08 1st Qu.:104.1
Median :128.4 Median :1.86e+08 Median :121.1
Mean :124.0 Mean :2.12e+08 Mean :116.1
3rd Qu.:139.7 3rd Qu.:2.57e+08 3rd Qu.:130.0
Max. :156.5 Max. :8.71e+08 Max. :146.4
R>
运行我们想要的计算:
R> tail(SPY[ endpoints(SPY) ])
SPY.Open SPY.High SPY.Low SPY.Close SPY.Volume
2012-08-31 141.29 141.82 140.36 141.16 151970400
2012-09-28 144.09 144.56 143.46 143.97 150696100
2012-10-31 141.85 142.03 140.68 141.35 103438500
2012-11-30 142.14 142.42 141.66 142.15 136568300
2012-12-31 139.66 142.56 139.54 142.41 243935200
2013-01-04 145.97 146.61 145.67 146.37 116790800
SPY.Adjusted
2012-08-31 139.42
2012-09-28 142.96
2012-10-31 140.35
2012-11-30 141.15
2012-12-31 142.41
2013-01-04 146.37
此处endpoints()
是您想要的功能,默认选择月份。它找到了我们想要的行索引。所以这里有多年了:
R> SPY[ endpoints(SPY, "years") ]
SPY.Open SPY.High SPY.Low SPY.Close SPY.Volume
2007-12-31 147.10 147.61 146.06 146.21 108126800
2008-12-31 89.08 90.97 88.87 90.24 193987200
2009-12-31 112.77 112.80 111.39 111.44 90637900
2010-12-31 125.53 125.87 125.33 125.75 91218900
2011-12-30 126.02 126.33 125.50 125.50 95599000
2012-12-31 139.66 142.56 139.54 142.41 243935200
2013-01-04 145.97 146.61 145.67 146.37 116790800
SPY.Adjusted
2007-12-31 131.14
2008-12-31 82.88
2009-12-31 104.73
2010-12-31 120.49
2011-12-30 122.78
2012-12-31 142.41
2013-01-04 146.37
R>
答案 1 :(得分:0)
您还可以使用基本R包提取信息:
#Get the years from the dataset
years=unique(format(dataset$Date, "%Y"))
#Get the last day values for each year
values=list()
for (y in 1:length(years)){
values[[y]]=dataset[dataset$Date==max(dataset$Date[format(dataset$Date, "%Y")==years[y]]),]
}
答案 2 :(得分:0)
基础解决方案:
获取一些测试数据:
test <- read.table(textConnection("Date Open High Low Close Volume Adj.Close
2012-12-28 52.98 53.27 52.62 52.64 3395700 52.64
2012-12-31 52.41 53.67 52.39 53.63 4623500 53.63
2013-01-03 55.07 55.61 55.00 55.37 7335200 55.37
2013-01-04 55.53 56.00 55.31 55.69 5455700 55.69"),header=TRUE)
将日期列更改为实际日期:
test$Date <- as.Date(test$Date)
获取与每年内最长日期对应的行:
do.call(
rbind,
by(test,format(test$Date,"%Y"),function(x) x[x$Date == max(x$Date),])
)
Date Open High Low Close Volume Adj.Close
2012 2012-12-31 52.41 53.67 52.39 53.63 4623500 53.63
2013 2013-01-04 55.53 56.00 55.31 55.69 5455700 55.69
答案 3 :(得分:0)
使用来自@thelatemail的“测试”数据集,这是另一个 - 不是一个,而是两个基础的R方法:
ave()
+ cut.Date()
+基本子集:
test[test$Date == ave(test$Date, cut(test$Date, "1 year"), FUN = max), ]
# Date Open High Low Close Volume Adj.Close
# 2 2012-12-31 52.41 53.67 52.39 53.63 4623500 53.63
# 4 2013-01-04 55.53 56.00 55.31 55.69 5455700 55.69
sapply()
+ split()
+ cut.Date()
。我不太喜欢你必须转置输出。我猜您也可以lapply()
代替sapply()
,然后使用do.call(rbind...)
获取data.frame
。
t(sapply(split(test, cut(test$Date, "1 year")),
function(x) x[which.max(x[["Date"]]),]))
# Date Open High Low Close Volume Adj.Close
# 2012-01-01 15705 52.41 53.67 52.39 53.63 4623500 53.63
# 2013-01-01 15709 55.53 56 55.31 55.69 5455700 55.69