假设:
df <- data.frame(
CompanyID=c("Drinkers","Drinkers","Drinkers","Drinkers","Drinkers","Drinkers","Drinkers","Drinkers"
,"Drinkers","Drinkers", "Liquders","Liquders","Liquders","PelletCoffeeCo","PelletCoffeeCo"),
Email= c("john@coffee.com", "john@coffee.com","john@coffee.com","john@coffee.com", "john@coffee.com",
"john@coffee.com", "john@coffee.com", "john@coffee.com", "john@coffee.com", "john@coffee.com",
"george@liquid.com","george@liquid.com","george@liquid.com","stacy@pelletcoffee.com",
"stacy@pelletcoffee.com"),
Day= c("1","2","3","4","5","6","7","8","9","10","1","2","3","1","2"),
var1= c(4,5,5,5,2,3,2,7,6,5,7,6,6,2,3))
我需要弄清楚如何到达:
df2 <- data.frame(CompanyID=c("Drinkers","Drinkers","Drinkers","Drinkers","Drinkers","Drinkers","Drinkers","Drinkers"
,"Drinkers","Drinkers", "Liquders","Liquders","Liquders","Liquders","Liquders","Liquders",
"Liquders","Liquders","Liquders","Liquders", "PelletCoffeeCo","PelletCoffeeCo","PelletCoffeeCo",
"PelletCoffeeCo","PelletCoffeeCo","PelletCoffeeCo","PelletCoffeeCo","PelletCoffeeCo",
"PelletCoffeeCo","PelletCoffeeCo"),
Email= c("john@coffee.com", "john@coffee.com","john@coffee.com","john@coffee.com", "john@coffee.com",
"john@coffee.com", "john@coffee.com", "john@coffee.com", "john@coffee.com", "john@coffee.com",
"george@liquid.com","george@liquid.com","george@liquid.com","george@liquid.com","george@liquid.com",
"george@liquid.com","george@liquid.com","george@liquid.com","george@liquid.com","george@liquid.com","stacy@pelletcoffee.com",
"stacy@pelletcoffee.com","stacy@pelletcoffee.com","stacy@pelletcoffee.com","stacy@pelletcoffee.com",
"stacy@pelletcoffee.com","stacy@pelletcoffee.com","stacy@pelletcoffee.com","stacy@pelletcoffee.com",
"stacy@pelletcoffee.com"),
Day= c("1","2","3","4","5","6","7","8","9","10","1","2","3","4","5","6","7","8","9","10",
"1","2","3","4","5","6","7","8","9","10"),
var1= c(4,5,5,5,2,3,2,7,6,5,7,6,6, NA,NA,NA,NA,NA,NA,NA, 2,3,NA,NA,NA,NA,NA,NA,NA,NA))
说明: 我有数据,我在10天的课程中每天调查一次。在一个完美的世界里,我会得到每个参与者的10个回复,用day1:day10表示。然而,由于没有响应,一些参与者给出了3个响应,其他人,6个,其他10个等等。我正在设置数据以运行增长模型,因此我需要“Day”列来始终读取Day1 - 第10天,无论是否有这些回复的数据。我试图通过将NA添加到没有所有10天数据的行来证明这一点。
我该怎么做?
先谢谢!
答案 0 :(得分:2)
试试这个:
library(tidyr)
df %>%
complete(nesting(CompanyID,Email), Day = seq(min(Day), max(Day), 1L)) %>%
data.frame()
输出:
CompanyID Email Day var1
1 Drinkers john@coffee.com 1 4
2 Drinkers john@coffee.com 2 5
3 Drinkers john@coffee.com 3 5
4 Drinkers john@coffee.com 4 5
5 Drinkers john@coffee.com 5 5
6 Drinkers john@coffee.com 6 2
7 Drinkers john@coffee.com 7 3
8 Drinkers john@coffee.com 8 2
9 Drinkers john@coffee.com 9 7
10 Drinkers john@coffee.com 10 6
11 Liquders george@liquid.com 1 7
12 Liquders george@liquid.com 2 NA
13 Liquders george@liquid.com 3 6
14 Liquders george@liquid.com 4 6
15 Liquders george@liquid.com 5 NA
16 Liquders george@liquid.com 6 NA
17 Liquders george@liquid.com 7 NA
18 Liquders george@liquid.com 8 NA
19 Liquders george@liquid.com 9 NA
20 Liquders george@liquid.com 10 NA
21 PelletCoffeeCo stacy@pelletcoffee.com 1 2
22 PelletCoffeeCo stacy@pelletcoffee.com 2 NA
23 PelletCoffeeCo stacy@pelletcoffee.com 3 3
24 PelletCoffeeCo stacy@pelletcoffee.com 4 NA
25 PelletCoffeeCo stacy@pelletcoffee.com 5 NA
26 PelletCoffeeCo stacy@pelletcoffee.com 6 NA
27 PelletCoffeeCo stacy@pelletcoffee.com 7 NA
28 PelletCoffeeCo stacy@pelletcoffee.com 8 NA
29 PelletCoffeeCo stacy@pelletcoffee.com 9 NA
30 PelletCoffeeCo stacy@pelletcoffee.com 10 NA
修改强>
上面的代码用一组完整的Day值填充每个组的Day列值,该值由该列中现有值的最小值和最大值定义(即分别为1和10)。填写这些日值的组可以根据需要重新定义,但我选择在此处将其定义为公司+电子邮件,其中包含&#34;嵌套(CompanyID,Email)&#34;。 data.frame()行就是将输出转换为data.frame而不是tibble。如果不需要data.frame输出,请随意替换或删除该行。
答案 1 :(得分:0)
首先,创建唯一公司ID的数据框。 接下来,创建所需日期的数据框。
交叉加入这些。
然后加入原始数据集以填写表格。
comp <- data.frame(CompanyID = unique(df$CompanyID))
Day <- data.frame(Day = c("1","2","3","4","5","6","7","8","9","10"))
compDay <- merge(comp, Day, all = TRUE)
dfday <- merge(df, compDay, by = c("CompanyID", "Day"), all = TRUE)