Question

请考虑以下data.frame：

df <- data.frame(ID = 1:2, Location = c("Love, Love, Singapore, Love, Europe, United States, Japan, Amazon, Seattle, Orchard Road, Love", 
                                        "Singapore, Singapore, Singapore") , stringsAsFactors = FALSE)

我想从上述df $ Location列中查找唯一数据，即我想获得一个新列，该列仅包含唯一位置名称，与下面提供的数据框完全相同；

df <- data.frame(ID = 1:2, Location = c("Love, Love, Singapore, Love, Europe, United States, Japan, Amazon, Seattle, Orchard Road, Love", 
                                        "Singapore, Singapore, Singapore") , 
                 Unique.Location = c("Love, Singapore, Europe, United States, Japan, Amazon, Seattle, Orchard Road",
                                     "Singapore"), stringsAsFactors = FALSE)

任何输入都是非常可观的。

Answer 1

在基数R中，我们可以用逗号分割字符串，并为每个unique仅粘贴Location字符串

df$unique.Location <- sapply(strsplit(df$Location, ","), function(x) 
                       toString(unique(trimws(x))))

或使用tidyr::separate_rows

的另一种方法

library(dplyr)

df %>% 
  tidyr::separate_rows(Location, sep = ", ") %>%
  group_by(ID) %>%
  summarise(Unique.Location = toString(unique(Location)), 
            Location = toString(Location))

Answer 2

您可以结合使用strsplit，sapply和unique：

df$Unique.Location <- sapply(strsplit(df$Location, split = ", "), function(x) paste0(unique(x), collapse = ", "))

Answer 3

使用tidyverse

的选项

library(dplyr)
library(purrr)
df %>% 
     mutate(unique.Location = str_extract_all(Location, "\\w+") %>%
          map_chr(~ toString(unique(.x))))

使用R连续查找唯一位置

3 个答案: