我真的不想连续问两个问题,但这是我无法解决的问题。所以,假设我有一个数据框,如下所示:
df
Row# User Morning Evening Measure Date
1 1 NA NA 2/18/11
2 1 50 115 2/19/11
3 1 85 128 2/20/11
4 1 62 NA 2/25/11
5 1 48 100.8 3/8/11
6 1 19 71 3/9/11
7 1 25 98 3/10/11
8 1 NA 105 3/11/11
9 2 48 105 2/18/11
10 2 28 203 2/19/11
11 2 35 80.99 2/21/11
12 2 91 78.25 2/22/11
在R中是否可以取前一个连续日(和仅前一天,而不是前一个结果)晚上1行的值和不同行的早晨值之间的差异每个用户组?所以我想要的结果就是这样。
df
Row# User Morning Evening Date Difference
1 1 NA NA 2/18/11 NA
2 1 50 115 2/19/11 NA
3 1 85 129 2/20/11 30
4 1 62 NA 2/25/11 NA
5 1 48 100.8 3/8/11 NA
6 1 19 71 3/9/11 81.8
7 1 25 98 3/10/11 46
8 1 10 105 3/11/11 88
9 2 48 105 2/18/11 NA
10 2 28 203 2/19/11 77
11 2 35 80.99 2/21/11 NA
12 2 91 78.25 2/22/11 -10.01
我想要做的就是取早上的值并从每个用户组的前一个连续日的晚值中减去它。正如您所看到的,我的数据框的某些部分在早晨和晚上的列中包含NA值,此外,并非所有日期都是针对每个不同用户的连续顺序,因此自然应该分配NA。
我尝试过搜索谷歌,但没有太多信息可以将功能应用于不同列上每组行的不同行(如果这有意义的话)。
我的尝试包含了很多变化。
df$Difference<-ave((df$Morning,df$Evening),
df$User,
FUN=function(x){
c('NA',diff(df$Evening-df$Morning)),na.rm=T
})
再一次,任何帮助将不胜感激。谢谢。
答案 0 :(得分:4)
盲注第一杆(未经测试)。依赖于已按用户和日期排序的数据框。
#if necessary, transform your dates from factor to Date
df$Date <- as.Date(levels(df$Date)[df$Date],format="%m/%d/%y")
df <- within(df,
Difference <- ifelse(c(NA,diff(Measure_Date)) == 1 & diff(User) == 0,
c(NA,head(Evening,-1)) - Morning, NA
)
)
答案 1 :(得分:4)
注意:您显示的输入数据与输出数据不同。输出中有NA
替换10
,输入中的最后日期为2/14/11
,输出中为2/22/11
。
我假设输出是原始数据,以创建符合您结果的答案。
df$Diff <- c(NA, head(df$Evening, -1) - tail(df$Morning, -1))
df$Diff[which(c(0, diff(as.Date(as.character(df$Measure_Date),
format="%m/%d/%Y"))) != 1)] <- NA
> df
# Row User Morning Evening Measure_Date Diff
# 1 1 1 NA NA 2/18/11 NA
# 2 2 1 50 115.00 2/19/11 NA
# 3 3 1 85 128.00 2/20/11 30.00
# 4 4 1 62 NA 2/25/11 NA
# 5 5 1 48 100.80 3/8/11 NA
# 6 6 1 19 71.00 3/9/11 81.80
# 7 7 1 25 98.00 3/10/11 46.00
# 8 8 1 10 105.00 3/11/11 88.00
# 9 9 2 48 105.00 2/18/11 NA
# 10 10 2 28 203.00 2/19/11 77.00
# 11 11 2 35 80.99 2/21/11 NA
# 12 12 2 91 78.25 2/22/11 -10.01
@ user1342086的编辑(被拒绝,但确实是对的):
df$Diff[which(diff(df$User) != 0)] <- NA
似乎照顾“用户”的分组。
答案 2 :(得分:2)
我使用了plyr
,所以请确保已安装它。即使用户数据混合(即不在连续的行中)且日期不是按时间顺序排列,此解决方案也应该有效。
# Your example data, as you should post it for us to use
df <-
structure(list(User = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L), Morning = c(NA, 50L, 85L, 62L, 48L, 19L, 25L, NA, 48L,
28L, 35L, 91L), Evening = c(NA, 115, 128, NA, 100.8, 71, 98,
105, 105, 203, 80.99, 78.25), Measure_Date = structure(c(1L,
2L, 3L, 5L, 9L, 10L, 6L, 7L, 1L, 2L, 4L, 8L), .Label = c("2/18/11",
"2/19/11", "2/20/11", "2/21/11", "2/25/11", "3/10/11", "3/11/11",
"3/14/11", "3/8/11", "3/9/11"), class = "factor")), .Names = c("User",
"Morning", "Evening", "Measure_Date"), class = "data.frame", row.names = c(NA,
-12L))
# As already stated by Arun, you need the date as class Date
df$Measure_Date <- as.Date(df$Measure_Date, format='%m/%d/%y')
# Use plyr to procces the dataframe by user
library(package=plyr)
ddply(.data=df, .variables='User',
.fun=function(x){
# Complete sequence of dates for each user
tdf <- data.frame(Measure_Date=seq(from=min(x$Measure_Date),
to=max(x$Measure_Date),
by='1 day'))
# Merge to fill in NAs for unused dates
tdf <- merge(tdf, x, all=TRUE)
# Put desired values side by side
tdf$Evening <- c(NA, tdf$Evening[-length(tdf$Evening)])
# Diference
tdf$Difference <- tdf$Evening - tdf$Morning
# Return desired value to original data
tdf <- tdf[,c('Measure_Date', 'Difference')]
x <- merge(x, tdf)
x
})