我有一个数据框,其中的字段x包含组名(在下面的示例中标记为字母)和组成员(在组名下的中列出,标为数字)。我想创建一个字段,为每个成员显示其组的名称。在下面的数据框中,所需的输出显示在“结果”列中。
df <- data.frame("x"=c("A","1","2","B","C","1","2","C","D","1"),
"outcome"=c("A","A","A","B","C","C","C","C","D","D")
) %>%
mutate(
Letter = ifelse(grepl("[A-Za-z]", x) == T,"Letter",
"No Letter")
)
我的想法是通过forloop做到这一点。如果x是一个字母,则应返回该字母,否则应返回前一个循环的结果(即x中找到的前一个字母)。 下面的forloop没有给出正确的输出:
df$outcome_calc[1] <- "A"
for (i in 2:10) {
df$outcome_calc[i] <- ifelse(df$Letter[i] == "No Letter",df$outcome_calc[i-1],df$x[i])
}
任何想法如何获得正确的输出?
答案 0 :(得分:2)
这里有两种tidyverse
方式,使用便捷功能zoo::na.locf
非常相似。
第一
library(tidyverse)
df %>%
mutate(na = is.na(as.numeric(as.character(x))),
outcome2 = ifelse(na, as.character(x), NA_character_),
outcome2 = zoo::na.locf(outcome2)) %>%
select(-na)
另一个:
df %>%
mutate(chr = !grepl("[[:digit:]]", x),
outcome2 = ifelse(chr, as.character(x), NA_character_),
outcome2 = zoo::na.locf(outcome2)) %>%
select(-chr)
答案 1 :(得分:1)
以下是使用for
循环执行此操作的一种方法:
# keeps track of previous letter
prev = ''
# output
op = c()
for (i in df$x){
# check the pattern
check = grepl(pattern = '[a-zA-Z]', x = i, ignore.case = T)
if(isTRUE(check)){
op = c(op, i)
prev = i
} else {
op = c(op, prev)
}
}
print(op)
[1] "A" "A" "A" "B" "C" "C" "C" "C" "D" "D"
答案 2 :(得分:1)
或者,您可以使用for
函数来避免sapply
循环。
您可以定义字母的位置:
pos_letter <- grep("[A-Za-z]", df$x)
然后,使用sapply
至1)为每一行定义字母正上方的位置,最后用对应的字母替换每个值:
df$out <- sapply(1:nrow(df),function(x) max(pos_letter[pos_letter <= x]))
df$out2 <- sapply(df$out, function(x) x = as.character(df[x,"x"]))
x outcome out out2
1 A A 1 A
2 1 A 1 A
3 2 A 1 A
4 B B 4 B
5 C C 5 C
6 1 C 5 C
7 2 C 5 C
8 C C 8 C
9 D D 9 D
10 1 D 9 D
您可以通过编写以下命令,将sapply
这两个函数合并在一行中:
sapply(1:nrow(df), function(n) as.character(df[max(pos_letter[pos_letter <= n]),"x"]))
[1] "A" "A" "A" "B" "C" "C" "C" "C" "D" "D"
答案 3 :(得分:1)
使用tidyr::fill
-要求提供您的电话号码所在的NA:
df = data.frame(x = c("A","1","2","B","C","1","2","C","D","1"),
stringsAsFactors = FALSE)
df$x[grepl("[0-9]+", df$x)] = NA
tidyr::fill(df, x)
x
1 A
2 A
3 A
4 B
5 C
6 C
7 C
8 C
9 D
10 D
答案 4 :(得分:0)
dplyr
这里是dynamic urls的简化版本,不需要创建临时帮助器列。它使用stringr::str_detect()
,if_else()
和zoo::na.locf()
。
library(dplyr)
df %>%
mutate(outcome2 = if_else(stringr::str_detect(x, "\\D"), x, factor(NA)) %>% zoo::na.locf())
x outcome Letter outcome2 1 A A Letter A 2 1 A No Letter A 3 2 A No Letter A 4 B B Letter B 5 C C Letter C 6 1 C No Letter C 7 2 C No Letter C 8 C C Letter C 9 D D Letter D 10 1 D No Letter D
data.table
为了完整起见,这也是我经常使用的data.table
方法。它使用引用分配来更新df
。
library(data.table)
setDT(df)[x %like% "\\D", outcome2 := x][, outcome2 := zoo::na.locf(outcome2)][]
x outcome Letter outcome2 1: A A Letter A 2: 1 A No Letter A 3: 2 A No Letter A 4: B B Letter B 5: C C Letter C 6: 1 C No Letter C 7: 2 C No Letter C 8: C C Letter C 9: D D Letter D 10: 1 D No Letter D