id first middle last Age
1 Carol Jenny Smith 15
2 Sarah Carol Roberts 20
3 Josh David Richardson 22
我正在尝试在任何名称列(第一,中间,最后)中找到一个特定名称。例如,如果我找到了一个名字叫Carol的人(无论名字/中间名/姓氏都没关系),我想对“ Carol”列进行突变并给出1。所以我想要的是以下
id first middle last Age Carol
1 Carol Jenny Smith 15 1
2 Sarah Carol Roberts 20 1
3 Josh David Richardson 22 0
我一直在尝试 ifelse(c(first,middle,last)==“ Carol”,1,0) 或“ Carol”%in%首先...等 但是由于某种原因,我只能处理一个专栏,而不是多个专栏。有人可以帮助我吗?预先谢谢你!
答案 0 :(得分:6)
我们可以使用rowSums
df$Carol <- as.integer(rowSums(df[2:4] == "Carol") > 0)
df
# id first middle last Age Carol
#1 1 Carol Jenny Smith 15 1
#2 2 Sarah Carol Roberts 20 1
#3 3 Josh David Richardson 22 0
如果我们需要它作为功能
fun <- function(df, value) {
as.integer(rowSums(df[2:4] == value) > 0)
}
fun(df, "Carol")
#[1] 1 1 0
fun(df, "Sarah")
#[1] 0 1 0
,但这假设您要搜索的列位于位置2:4
。
为列位置提供更大的灵活性
fun <- function(df, cols, value) {
as.integer(rowSums(df[cols] == value) > 0)
}
fun(df, c("first", "last","middle"), "Carol")
#[1] 1 1 0
fun(df, c("first", "last","middle"), "Sarah")
#[1] 0 1 0
答案 1 :(得分:3)
这是一个tidyverse
选项。我们首先将数据整形为长格式,按id
分组,然后在至少一行中找到具有所需名称的id
级别。然后,我们将其重新调整为宽幅格式。
library(tidyverse)
df %>%
gather(key, value, first:last) %>%
group_by(id) %>%
mutate(Carol = as.numeric(any(value=="Carol"))) %>%
spread(key, value)
id Age Carol first last middle 1 1 15 1 Carol Smith Jenny 2 2 20 1 Sarah Roberts Carol 3 3 22 0 Josh Richardson David
或者,作为功能:
find.target = function(data, target) {
data %>%
gather(key, value, first:last) %>%
group_by(id) %>%
mutate(!!target := as.numeric(any(value==target))) %>%
spread(key, value) %>%
# Move new target column to end
select(-target, target)
}
find.target(df, "Carol")
find.target(df, "Sarah")
您也可以一次执行多个操作。例如:
map(c("Sarah", "Carol", "David"), ~ find.target(df, .x)) %>%
reduce(left_join)
id Age first last middle Sarah Carol David 1 1 15 Carol Smith Jenny 0 1 0 2 2 20 Sarah Roberts Carol 1 1 0 3 3 22 Josh Richardson David 0 0 1
答案 2 :(得分:2)
使用tidyverse
library(tidyverse)
f1 <- function(data, wordToCompare, colsToCompare) {
wordToCompare <- enquo(wordToCompare)
data %>%
select(colsToCompare) %>%
mutate(!! wordToCompare := map(., ~
.x == as_label(wordToCompare)) %>%
reduce(`|`) %>%
as.integer)
}
f1(df1, Carol, c("first", 'middle', 'last'))
# first middle last Carol
#1 Carol Jenny Smith 1
#2 Sarah Carol Roberts 1
#3 Josh David Richardson 0
f1(df1, Sarah, c("first", 'middle', 'last'))
# first middle last Sarah
#1 Carol Jenny Smith 0
#2 Sarah Carol Roberts 1
#3 Josh David Richardson 0
或者也可以使用pmap
df1 %>%
mutate(Carol = pmap_int(.[c('first', 'middle', 'last')],
~ +('Carol' %in% c(...))))
# id first middle last Age Carol
#1 1 Carol Jenny Smith 15 1
#2 2 Sarah Carol Roberts 20 1
#3 3 Josh David Richardson 22 0
可以包装到函数中
f2 <- function(data, wordToCompare, colsToCompare) {
wordToCompare <- enquo(wordToCompare)
data %>%
mutate(!! wordToCompare := pmap_int(.[colsToCompare],
~ +(as_label(wordToCompare) %in% c(...))))
}
f2(df1, Carol, c("first", 'middle', 'last'))
# id first middle last Age Carol
#1 1 Carol Jenny Smith 15 1
#2 2 Sarah Carol Roberts 20 1
#3 3 Josh David Richardson 22 0
注意:两种tidyverse方法都不需要任何重塑
使用base R
,我们可以循环浏览“第一”,“中间”,“最后”列,并使用==
进行比较,以获得逻辑{{1}的list
} s,我们用vector
Reduce
到单个逻辑vector
,然后用|
强制使其变成二进制。
+
注意:此职位有虚假信息。例如here
df1$Carol <- +(Reduce(`|`, lapply(df1[2:4], `==`, 'Carol')))
df1
# id first middle last Age Carol
#1 1 Carol Jenny Smith 15 1
#2 2 Sarah Carol Roberts 20 1
#3 3 Josh David Richardson 22 0
答案 3 :(得分:1)
使用apply
系列的解决方案
df$Carol = lapply(1:nrow(df), function(x) any(df[x,]=="Carol))
答案 4 :(得分:1)
按照您的建议使用mutate
和if_else()
的另一个选项:
library(tidyverse)
data = read_table(" id first middle last Age
1 Carol Jenny Smith 15
2 Sarah Carol Roberts 20
3 Josh David Richardson 22")
data %>%
mutate(carol = if_else(first == "Carol" | middle == "Carol" | last == "Carol",
"yes",
"no"))
结果:
# A tibble: 3 x 6
id first middle last Age carol
<dbl> <chr> <chr> <chr> <dbl> <chr>
1 1 Carol Jenny Smith 15 yes
2 2 Sarah Carol Roberts 20 yes
3 3 Josh David Richardson 22 no