数据如下:
df1=data.frame(Date=as.Date(c('8/27/2001','8/27/2001','8/27/2001','11/13/2001','11/13/2001','11/13/2001','8/3/2012','8/3/2012'),format="%m/%d/%Y"),
Name=c('Joe', 'Joe', 'Joe', 'Billy', 'Billy', 'Billy','Emma','Emma'),
Sample=c('Pre','Post','Discard','Pre','Post','Discard','Bone','Pre'),
Cells=c(15,7,3,12,5,2,14,NA))
Date Name Sample Cells
1 2001-08-27 Joe Pre 15
2 2001-08-27 Joe Post 7
3 2001-08-27 Joe Discard 3
4 2001-11-13 Billy Pre 12
5 2001-11-13 Billy Post 5
6 2001-11-13 Billy Discard 2
7 2012-08-03 Emma Bone 14
8 2012-08-03 Emma Pre NA
我想基于日期和名称的唯一分组添加一个名为“ Yield”的计算列(例如,条目1-3、4-6或7-8都代表不同的组)。实际数据可能不完整(请参阅条目7-8)。
“收益”列应为:
Cells where Sample="Post" divided by Cells where Sample="Pre"
所需的输出:
Date Name Sample Cells Yield
1 2001-08-27 Joe Pre 15 NA
2 2001-08-27 Joe Post 7 0.46
3 2001-08-27 Joe Discard 3 NA
4 2001-11-13 Billy Pre 12 NA
5 2001-11-13 Billy Post 5 0.41
6 2001-11-13 Billy Discard 2 NA
7 2012-08-03 Emma Bone 14 NA
8 2012-08-03 Emma Pre NA NA
我是R的新手,并且想高效地使用它(例如,使用dplyr
)。以上可以通过循环来完成,但是我正在寻找更优雅的解决方案。我已经咨询了以下主题以寻求指导,但到目前为止尚未找到解决方案:
Assign value to group based on condition in column
R create column from another column, depending on row
Conditional calculation in R based on Row values and categories
答案 0 :(得分:1)
您可以这样做:
library(dplyr)
df1 %>%
group_by(Date, Name) %>%
mutate(Yield = ifelse(Sample == "Post", Cells[Sample == "Post"]/Cells[Sample == "Pre"], NA))
# A tibble: 8 x 5
# Groups: Name [3]
Date Name Sample Cells Yield
<date> <fct> <fct> <dbl> <dbl>
1 2001-08-27 Joe Pre 15 NA
2 2001-08-27 Joe Post 7 0.467
3 2001-08-27 Joe Discard 3 NA
4 2001-11-13 Billy Pre 12 NA
5 2001-11-13 Billy Post 5 0.417
6 2001-11-13 Billy Discard 2 NA
7 2012-08-03 Emma Bone 14 NA
8 2012-08-03 Emma Pre NA NA
答案 1 :(得分:1)
如果您不太喜欢特定的表格格式,则可以执行以下操作:
library(dplyr)
library(tidyr)
df1 %>%
spread(Sample, Cells) %>%
mutate(Pre_Post_Yield = Post/Pre)
这将返回一个更易于理解的表:
Date Name Bone Discard Post Pre Pre_Post_Yield
1 2001-08-27 Joe NA 3 7 15 0.4666667
2 2001-11-13 Billy NA 2 5 12 0.4166667
3 2012-08-03 Emma 14 NA NA NA NA
要返回长格式,可以添加gather(Sample, Cells, Bone:Pre)
。请注意,结果看起来将与示例输出完全不同,因为R将填充以前不存在的变量组合。乍一看可能有点怪异,但您会发现它实际上非常有用,例如因为它使您丢失的数据变得明确:
Date Name Pre_Post_Yield Sample Cells
1 2001-08-27 Joe 0.4666667 Bone NA
2 2001-11-13 Billy 0.4166667 Bone NA
3 2012-08-03 Emma NA Bone 14
4 2001-08-27 Joe 0.4666667 Discard 3
5 2001-11-13 Billy 0.4166667 Discard 2
6 2012-08-03 Emma NA Discard NA
7 2001-08-27 Joe 0.4666667 Post 7
8 2001-11-13 Billy 0.4166667 Post 5
9 2012-08-03 Emma NA Post NA
10 2001-08-27 Joe 0.4666667 Pre 15
11 2001-11-13 Billy 0.4166667 Pre 12
12 2012-08-03 Emma NA Pre NA