我正在使用正则表达式提取模式并使用dplyr构建data.frame
library (dplyr)
library (stringr)
Target <- c("@user1 lorem ipsum @user2", "@user3 lorem ipsum @user4")
Source <- c(" lorem ipsum", "dolores")
dataset <- data.frame(Source, Target)
dataset2 <- dataset %>%
mutate (Target=str_extract_all(v1, "@\\w+"))
我的结果(data.frame):
lorem ipsum c("@user1", "@user2")
dolores c("@user3", "@user4")
我想要的 data.frame 对象:
lorem ipsum "@user1"
lorem ipsum "@user2"
dolores "@user3"
dolores "@user4"
答案 0 :(得分:1)
我们可以尝试
stack(setNames(str_extract_all(dataset$Target, "@\\w+"), dataset$Source))[2:1]
# ind values
#1 lorem ipsum @user1
#2 lorem ipsum @user2
#3 dolores @user3
#4 dolores @user4
或者我们可以使用unnest
tidyr
library(dplyr)
library(tidyr)
dataset %>%
mutate(Target = str_extract_all(Target, "@\\w+")) %>%
unnest
# Source Target
#1 lorem ipsum @user1
#2 lorem ipsum @user2
#3 dolores @user3
#4 dolores @user4