R - expand.grid仅在一个变量的级别内

时间:2017-01-11 18:19:06

标签: r

我的数据集类似于:

网站示例日期:

A   A1 2016-09-01 
A   A1 2016-09-21 
A   A2 2016-09-15 
A   A2 2016-09-21 
B   B1 2016-09-03 
B   B2 2016-09-12 

我想做的是expand.grid,但只能在df $ Site的每个级别内实现:

网站示例日期:

A   A1  2016-09-01
A   A1  2016-09-15
A   A1  2016-09-21
A   A2  2016-09-01
A   A2  2016-09-15
A   A2  2016-09-21
B   B1  2016-09-03
B   B1  2016-09-12
B   B2  2016-09-03
B   B2  2016-09-12

但我不知道如何使用expand.grid指定,所以我不会最终:

网站示例日期:

A   A1  2016-09-01
A   A1  2016-09-03
A   A1  2016-09-12
A   A1  2016-09-15
A   A1  2016-09-21
A   A2  2016-09-01
A   A2  2016-09-03
A   A2  2016-09-12
A   A2  2016-09-15
A   A2  2016-09-21
B   B1  2016-09-01
B   B1  2016-09-03
B   B1  2016-09-12
B   B1  2016-09-15
B   B1  2016-09-21
B   B2  2016-09-01
B   B2  2016-09-03
B   B2  2016-09-12
B   B2  2016-09-15
B   B2  2016-09-21

我希望这很清楚,我无法弄清楚如何很好地格式化这些表格!

2 个答案:

答案 0 :(得分:1)

我们可以在使用'dplyr / tidyr'

对'Site'进行分组后执行此操作
library(dplyr)
library(tidyr)
df1 %>%
   group_by(Site) %>%
   expand(Sample, Date)
#    Site Sample       Date
#   <chr>  <chr>      <chr>
#1      A     A1 2016-09-01
#2      A     A1 2016-09-15
#3      A     A1 2016-09-21
#4      A     A2 2016-09-01
#5      A     A2 2016-09-15
#6      A     A2 2016-09-21
#7      B     B1 2016-09-03
#8      B     B1 2016-09-12
#9      B     B2 2016-09-03
#10     B     B2 2016-09-12

或使用data.table

library(data.table)
setDT(df1)[, do.call(CJ, lapply(.SD, unique)) , by = Site]
#    Site Sample       Date
# 1:    A     A1 2016-09-01
# 2:    A     A1 2016-09-15
# 3:    A     A1 2016-09-21
# 4:    A     A2 2016-09-01
# 5:    A     A2 2016-09-15
# 6:    A     A2 2016-09-21
# 7:    B     B1 2016-09-03
# 8:    B     B1 2016-09-12
# 9:    B     B2 2016-09-03
#10:    B     B2 2016-09-12

或者我们可以使用base R解决方案

do.call(rbind, lapply(split(df1[-1], df1$Site), 
         function(x) expand.grid(lapply(x, unique))))
#   Sample       Date
#A.1     A1 2016-09-01
#A.2     A2 2016-09-01
#A.3     A1 2016-09-21
#A.4     A2 2016-09-21
#A.5     A1 2016-09-15
#A.6     A2 2016-09-15
#B.1     B1 2016-09-03
#B.2     B2 2016-09-03
#B.3     B1 2016-09-12
#B.4     B2 2016-09-12

数据

df1 <- structure(list(Site = c("A", "A", "A", "A", "B", "B"), Sample = c("A1", 
"A1", "A2", "A2", "B1", "B2"), Date = c("2016-09-01", "2016-09-21", 
"2016-09-15", "2016-09-21", "2016-09-03", "2016-09-12")), .Names = c("Site", 
"Sample", "Date"), class = "data.frame", row.names = c(NA, -6L))

答案 1 :(得分:0)

这是基础R解决方案。您可以提供expand.grid这样的唯一向量

do.call(rbind, lapply(split(df, df$Site),
               function(i) with(i, expand.grid(unique(Site), unique(Sample), unique(Date)))))

Var1 Var2       Var3
A.1    A   A1 2016-09-01
A.2    A   A2 2016-09-01
A.3    A   A1 2016-09-21
A.4    A   A2 2016-09-21
A.5    A   A1 2016-09-15
A.6    A   A2 2016-09-15
B.1    B   B1 2016-09-03
B.2    B   B2 2016-09-03
B.3    B   B1 2016-09-12
B.4    B   B2 2016-09-12

或在每个展开的data.frame上使用unique

do.call(rbind, lapply(split(df, df$Site),
                     function(i) with(i, unique(expand.grid(Site, Sample, Date)))))
     Var1 Var2       Var3
A.1     A   A1 2016-09-01
A.9     A   A2 2016-09-01
A.17    A   A1 2016-09-21
A.25    A   A2 2016-09-21
A.33    A   A1 2016-09-15
A.41    A   A2 2016-09-15
B.1     B   B1 2016-09-03
B.3     B   B2 2016-09-03
B.5     B   B1 2016-09-12
B.7     B   B2 2016-09-12