请考虑以下data.frame:
df <- data.frame(ID = 1:2, Location = c("Love, Love, Singapore, Love, Europe, United States, Japan, Amazon, Seattle, Orchard Road, Love",
"Singapore, Singapore, Singapore") , stringsAsFactors = FALSE)
我想从上述df $ Location列中查找唯一数据,即我想获得一个新列,该列仅包含唯一位置名称,与下面提供的数据框完全相同; >
df <- data.frame(ID = 1:2, Location = c("Love, Love, Singapore, Love, Europe, United States, Japan, Amazon, Seattle, Orchard Road, Love",
"Singapore, Singapore, Singapore") ,
Unique.Location = c("Love, Singapore, Europe, United States, Japan, Amazon, Seattle, Orchard Road",
"Singapore"), stringsAsFactors = FALSE)
任何输入都是非常可观的。
答案 0 :(得分:3)
在基数R中,我们可以用逗号分割字符串,并为每个unique
仅粘贴Location
字符串
df$unique.Location <- sapply(strsplit(df$Location, ","), function(x)
toString(unique(trimws(x))))
或使用tidyr::separate_rows
library(dplyr)
df %>%
tidyr::separate_rows(Location, sep = ", ") %>%
group_by(ID) %>%
summarise(Unique.Location = toString(unique(Location)),
Location = toString(Location))
答案 1 :(得分:2)
您可以结合使用strsplit
,sapply
和unique
:
df$Unique.Location <- sapply(strsplit(df$Location, split = ", "), function(x) paste0(unique(x), collapse = ", "))
答案 2 :(得分:0)
使用tidyverse
library(dplyr)
library(purrr)
df %>%
mutate(unique.Location = str_extract_all(Location, "\\w+") %>%
map_chr(~ toString(unique(.x))))