如何将数据帧子集化为新的数据帧

时间:2014-09-30 15:13:00

标签: r plyr

我有一个数据框,其中包含日期,时间和价格作为列标题。我想根据时间创建一个新的数据帧,即一列的所有价格都在00:00,另一列的所有价格都在01:00,下一个的价格在02:00等等。

我用ddply来计算每次的平均值;

ddply(df,.(Time),function(x) mean(x$No.Trade))

它工作正常,但我希望在特定时间有一个新的价格列表数据框,以便完成对它们的更多分析。

2 个答案:

答案 0 :(得分:0)

您可能希望根据dataframedate的唯一组合创建不同的time个对象。我建议使用split,然后使用listlapply中完成剩余的计算/分析,而不是在全局环境中创建单独的data.frames

lst <- split(df["Price"], list(df$Date, df$Time), drop=TRUE)

您可以在lst中执行大部分操作。例如:

sapply(lst, function(x) mean(x$Price, na.rm=TRUE))
#02-Jan-96.03:20 02-Jan-96.03:25 02-Jan-96.03:45 02-Jan-96.04:20 
 #  366.1500        337.1500        346.4333        353.4833 

但是,如果您需要创建个人data.frames

nm1 <- gsub("[[:punct:]]", "", paste("Var", names(lst),sep="_"))
nm1
#[1] "Var02Jan960320" "Var02Jan960325" "Var02Jan960345" "Var02Jan960420"

list2env(setNames(lst, nm1), envir=.GlobalEnv)

Var02Jan960320
  #   Price
 #1 387.15
 #4 345.15

数据

 df <- structure(list(Date = c("02-Jan-96", "02-Jan-96", "02-Jan-96", 
 "02-Jan-96", "02-Jan-96", "02-Jan-96", "02-Jan-96", "02-Jan-96", 
 "02-Jan-96"), Time = c("03:20", "03:45", "04:20", "03:20", "03:45", 
 "04:20", "03:25", "03:45", "04:20"), Price = c(387.15, 387.1, 
387.15, 345.15, 325.1, 335.15, 337.15, 327.1, 338.15)), .Names = c("Date", 
"Time", "Price"), class = "data.frame", row.names = c("1", "2", 
"3", "4", "5", "6", "7", "8", "9"))

答案 1 :(得分:0)

您可以使用reshape2包。

从akrun

获取数据代码
your.data <- structure(list(Date = c("02-Jan-96", "02-Jan-96", "02-Jan-96", 
 "02-Jan-96", "02-Jan-96", "02-Jan-96", "02-Jan-96", "02-Jan-96", 
 "02-Jan-96"), Time = c("03:20", "03:45", "04:20", "03:20", "03:45", 
 "04:20", "03:25", "03:45", "04:20"), Price = c(387.15, 387.1, 
 387.15, 345.15, 325.1, 335.15, 337.15, 327.1, 338.15)), .Names = c("Date", 
 "Time", "Price"), class = "data.frame", row.names = c("1", "2", 
 "3", "4", "5", "6", "7", "8", "9"))

your.data
#       Date  Time  Price
#1 02-Jan-96 03:20 387.15
#2 02-Jan-96 03:45 387.10
#3 02-Jan-96 04:20 387.15
#4 02-Jan-96 03:20 345.15
#5 02-Jan-96 03:45 325.10
#6 02-Jan-96 04:20 335.15
#7 02-Jan-96 03:25 337.15
#8 02-Jan-96 03:45 327.10
#9 02-Jan-96 04:20 338.15

使用dcast()

dcast(your.data, Date ~ Time, mean)
#       Date  03:20  03:25    03:45    04:20
#1 02-Jan-96 366.15 337.15 346.4333 353.4833