不确定如何弄清楚这一点。
以下是一个示例数据集:
Bob <- sample("Bob", 6, replace = T)
Jeff <- sample("Jeff", 6, replace = T)
Carl <- sample("Carl", 6, replace = T)
Name <- array(c(Bob, Jeff, Carl), dim = c(18,1))
Week <- c("Week 1", "Week 2", "Week 3", "Week 4", "Week 5", "Week 6",
"Week 1", "Week 2", "Week 3", "Week 4", "Week 5", "Week 6",
"Week 1", "Week 2", "Week 3", "Week 4", "Week 5", "Week 6")
variable.1 <- c("No", "No", "No", "Yes", "No", "No",
"Yes", "No", "No", "No", "Yes", "No",
"No", "Yes", "No", "No", "No", "Yes")
df <- data.frame(Name, Week, variable.1)
df
Name Week variable.1
1 Bob Week 1 No
2 Bob Week 2 No
3 Bob Week 3 No
4 Bob Week 4 Yes
5 Bob Week 5 No
6 Bob Week 6 No
7 Jeff Week 1 Yes
8 Jeff Week 2 No
9 Jeff Week 3 No
10 Jeff Week 4 No
11 Jeff Week 5 Yes
12 Jeff Week 6 No
13 Carl Week 1 No
14 Carl Week 2 Yes
15 Carl Week 3 No
16 Carl Week 4 No
17 Carl Week 5 No
18 Carl Week 6 Yes
我想做的是移动任何&#34;是&#34;在变量1列中排成一行,以便它可以反映为前一周信息的因子变量。我试图通过个人(而不是整个数据集)来做这件事。当两个变量都是因素时,我无法找出解决这个问题的最佳方法。理想情况下,我想要NA出现。我不希望一切都变得简单。我只想让NA出现在&#34;是&#34;是的,并且它覆盖了&#34; No&#34;在它之上。
所以,我理想地喜欢完成的产品就像&#34; New.Col&#34;下面:
Name Week variable.1 New.Col
1 Bob Week 1 No No
2 Bob Week 2 No No
3 Bob Week 3 No Yes
4 Bob Week 4 Yes NA
5 Bob Week 5 No No
6 Bob Week 6 No No
7 Jeff Week 1 Yes NA
8 Jeff Week 2 No No
9 Jeff Week 3 No No
10 Jeff Week 4 No Yes
11 Jeff Week 5 Yes NA
12 Jeff Week 6 No No
13 Carl Week 1 No Yes
14 Carl Week 2 Yes NA
15 Carl Week 3 No No
16 Carl Week 4 No No
17 Carl Week 5 No Yes
18 Carl Week 6 Yes NA
答案 0 :(得分:1)
试试吧。
我将继续按名称和周排序df
,以防某些数据出现故障。 (这不会导致任何错过的周数!)我还会在variable.1
中复制newcol
作为字符进行播放。
df <- df[order(df$Name, df$Week),]
df$newcol <- as.character(df$variable.1)
为了便于理解,我会写一个循环,但是计算,有更好的方法来做到这一点。此循环将查看df $ Name
中的每个唯一人物for (person in unique(df$Name)) {
}
在循环中,我想为每个人选择newcol
中的所有条目。
oldvalues <- df[df$Name == person, ]$newcol
然后我继续将每个值向上移动1个条目并将最后一个条目设为NA。
newvalues <- c(oldvalues[2:length(oldvalues)], NA)
我还想考虑每次旧值是&#34;是&#34;通过使那个星期NA。
newvalues[oldvalues == "Yes"] <- NA
然后我可以把它放回df
。
df[df$Name == person,]$newcol <- newvalues
现在循环已完成,您可以将df$newcol
恢复为默认情况下排除NA
级别的因素
df$newcol <- factor(df$newcol)
或将其作为第三个因素水平
df$newcol <- factor(df$newcol, exclude = NULL)