我有超过10天测量的巨大df。现在我需要获得整个数据集,单天和几天的差异和可重复性。对整个数据集来说,这很容易。在这一天,我创建了以下循环(其工作顺便说一下):
All_D <- unique(lam$Start_date)
for (d in 1:10){
jaj.d <- All_D[d]
Days.d <- subset(lam, Start_date == jaj.d)
jaa <- as.data.frame(as.table(with(Days.d, tapply(CH4, ID, FUN = var))))
names(jaa) <- c("ID", "within_ani")
write.csv(jaa, paste("Day_",jaj.d,".csv",sep = ""),row.names = F)
}
现在我想组成两天的小组,然后走过&#34; 10天,但他们必须保持聚集..所以喜欢:
2013-09-01&amp; 2013-09-02,2013-09-02&amp; 2013-09-03,2013-09-03&amp; 2013-09-04,..,
2013-09-09&amp; 2013年9月10日
我认为有必要创建另一个循环,但是(除了上面的信息)我不知道从哪里开始..我也必须分组3-9天,所以我宁愿不这样做用手!我有一个df,如下所示:
'data.frame': 1420847 obs. of 22 variables:
$ ID : int 12338 12338 12338 12338 12338 12338 12338 12338 12338 12338 ...
$ CO2 : int 1510 1950 1190 1170 780 870 730 740 680 700 ...
$ CH4 : int 66 77 62 58 34 51 36 43 32 40 ...
$ Start_date: chr "2013-09-01" "2013-09-01" "2013-09-01" "2013-09-01" ...
我是一个关于R的小伙子,我希望有人可以给我一个正确方向的推动?我一直在努力解决这个问题几个小时,我似乎无法在这个网站或网络上的其他地方找到解决方案。英语不是我的母语,我发现很难找到合适的搜索术语,所以这并不是因为缺乏尝试。
如果我的问题仍然不清楚,请告诉我,我会尝试调整它。
修改
Soooooo,在你们的帮助下,我想出了这个循环:
> lam <- df
lam$Start_date <- as.Date(lam$Start_date)
require(data.table)
lam <- as.data.table(lam) #transform df to dt
lam[,date1 := c(1, diff(Start_date))] #assign each date a different number
lam[,date1 := cumsum(date1)]
for (i in 1:10) { #loop through each level of date
lap.i <- split(lam, lam$date1) #split date1 to get single days
for (j in 1:(i+1)) { #loop through each level of date
lap.j <- split(lam, lam$date1) #split date1 to get the day after i
}
for (k in (i+2)) {
lap.k <- split(lam, lam$date1) #split date1 to get the day after j
}
for (l in 1:(i+3)) {
lap.l <- split(lam, lam$date1) #split date1 to get the day after k
}
lap.i.j.k.l <- rbind(lap[[i]], lap[[j]], lap[[k]], lap[[l]]) #binding the lists together
var.i.j.k.l <- var(lap.i.j.k.l$CH4) #get the between individual variance for CH4
#get a df with individual variances for CH4
kill <- as.data.frame(c(with(lap.i.j.k.l, tapply(CH4, CowID, FUN = var)), var.i.j.k.l))
names(kill) <- c("variance") #name columns in df
#write to a .csv file in wd
write.csv(kill, paste("consecutive days_", i, "_", j, "_", k, "_", l, ".csv", sep = ""))
}
这正是我想要的,但是,R不是循环内循环内部循环的忠实粉丝,依此类推。上面的循环是为了获得连续4天数据的表,我需要连续9天才行。由于上面的循环已经要求很多这台计算机,我想知道什么是更短,更容易,更有效要做到这一点的方法?不是&#34;如果&#34;因为我知道它存在,Codoremifa已经向我展示了这一点,只是因为他的代码并没有完全符合我的要求,我似乎无法弄清楚它是如何工作的。
编辑2
我想要完成的任务:
ID CO2 CH4 dates date1
1 12338 1510 66 2013-09-01 1
2 12338 1950 77 2013-09-01 1
3 12338 1190 62 2013-09-01 1
4 12338 1170 58 2013-09-02 1
5 12338 780 34 2013-09-02 1
6 12338 870 51 2013-09-03 2
7 12338 1670 66 2013-09-03 2
8 12338 1980 77 2013-09-03 2
9 12338 1330 62 2013-09-04 2
10 12338 1850 58 2013-09-04 2
11 12338 1640 34 2013-09-05 3
12 12338 590 51 2013-09-05 3
之后列出如下:
> [1]
ID var
12338 164077.4
12339 78420.31
12352 91472.76
> [2]
ID var
12338 33543.16
12339 184467.1
12352 202267.3
我想写一个.csv文件
答案 0 :(得分:1)
我不确定您的输出需要什么样。这应该会让你知道应该尝试什么。如果您可以发布样本数据,我可以相应地编辑我的答案
library(data.table)
# sample data
dt <- data.table(
dates = rep(seq.Date(
as.Date('01-01-2013','%d-%m-%Y'),
as.Date('03-01-2013','%d-%m-%Y'),
by = 'days'
),3),
values = rnorm(3,0)
)
# ordering dataset by dates
setkeyv(dt,'dates')
# assigning each date a unique number
dt[,flag := c(0,diff(dates))]
dt[,flag := cumsum(flag)]
noofdates <- max(dt[,flag])+1
# i is the counter for how many dates need to be clubbed
for ( i in 1:3 )
{
# creating list to store intermediate data
grouplist <- vector(mode = "list", length = i)
# j is the counter for each group of i dates
for ( j in 1:(noofdates-i+1) )
{
# egtting the subset for each group
dttemp <- dt[flag %in% c(j:(j+i))]
# storing the variance in a list
grouplist[[j]] <- dttemp[, list(varvalues = var(values))]
}
# combining the list into one data.table
groupdt <- rbindlist(grouplist)
#write out
write.csv(groupdt,paste0('name_',i,"_",j,'.csv'))
}
答案 1 :(得分:0)
我不太确定“分组日”是什么意思(你想做什么?)。
如果您只想每隔2天,3天等访问一次,这很容易实现:
# create some data
dates <- as.Date(paste0("1990-11-", 1:10))
df <- data.frame(a = sample(10), b = sample(10), date = dates)
# you could, of course, also specify begin and end manually
days.ordered <- sort(df$date)
begin <- days.ordered[1]
end <- tail(days.ordered, n = 1)
seq(begin, end, by='2 days') # or 3 days, or 4 days
但您似乎正在寻找一种方法将整个数据框分成按天间隔定义的组。
# create some data
dates <- as.Date(paste0("1990-11-", rep(1:10, each=3)))
df <- data.frame(id = rep(1:10, each=3), CH4 = 1:30, CO2 = 1:30, date = dates)
# you could, of course, also specify begin and end manually
days.ordered <- sort(df$date)
begin <- days.ordered[1]
end <- tail(days.ordered, n = 1)
by.n <- 2 # adjust
groups <- seq(begin, end + by.n, by=paste(by.n, "days"))
require(plyr)
ddply(df, .(id, cut(date, breaks = groups)), summarize,
VarCH4 = var(CH4),
varCO2 = var(CO2))