我有一些数据:
testData <- tibble(fname = c("Alice", "Bob", "Charlie", "Dan", "Eric"),
lname = c("Smith", "West", "CharlieBlack", "DanMcDowell", "Bush"))
一些姓氏与他们的名字串联在一起。
解决并修复lname
列的有效方法是什么?
我希望它看起来像这样:
lname = c("Smith", "West", "Black", "McDowell", "Bush")
我可以使用for
循环,但是我有50万行数据,所以我想找到一种更有效的方法。
答案 0 :(得分:2)
We can use str_remove
library(tidyverse)
testData %>%
mutate(lname = str_remove(lname, fname))
# A tibble: 5 x 2
# fname lname
# <chr> <chr>
#1 Alice Smith
#2 Bob West
#3 Charlie Black
#4 Dan McDowell
#5 Eric Bush
答案 1 :(得分:0)
We can use gsub
within apply
:
apply(testData,1,function(x) gsub(x['fname'],"",x['lname']))
Output:
[1] "Smith" "West" "Black" "McDowell" "Bush"
答案 2 :(得分:0)
try mutate
with an ifelse
clause to catch the lname
entires that are concatenated, e.g.:
library(dplyr)
testData <- testData %>% mutate(lname = ifelse(grepl('[[:upper:]][[:lower:]]+[[:upper:]]', lname), gsub('^[[:upper:]][[:lower:]]+', "", lname), lname))
In this example, you are saying "mutate lname
IF the string has
an uppercase letter + at least one lowercase letter + an uppercase letter. If that condition is met, replace the first uppercase letter and following lowercase letters with nothing. If that condition is not met, just keep the original lname
text".