Question

我正在尝试为我的genres集添加流派。但是，我的流派设置为NULL。

功能：

install.packages("sets"); library(sets)
genres = set()
find_all_genres = function(genres_string) {
  if (genres_string == "N/A") {
    return(NA)
  }
  genres_list = strsplit(genres_string, ",\\s+")[[1]]
  for (genre in genres_list) {
    genres = genres | set(genre)
  }
}

sapply(df2$Genre, FUN = find_all_genres)

样品：

> head(df2$Genre)
[1] "Documentary, Biography, Romance" "Short, Thriller"                 "Documentary"                     "Drama, Romance"                  "War, Short"                     
[6] "Documentary, Biography"

预期的输出将是单独的行：

genres = {"Action", "Drama", "Comedy"}

当然还有更多类型。

另外，我怎样才能加快我的功能？我是R的新手

Answer 1

使用scan将其读入并unique删除重复项。 g在最后的注释中给出。没有包使用。

unique(scan(text = g, what = "", sep = ",", na.strings = "N/A", 
  strip.white = TRUE, quiet = TRUE))

，并提供：

[1] "Documentary" "Biography"   "Romance"     "Short"       "Thriller"   
[6] "Drama"       "War"

如果您希望对其进行排序，请使用sort。

功能

如果你想添加一些以前的值，将整个事物写成一个函数：

add <- function(...) {
    unique(scan(text = c(...), what = "", sep = ",", na.strings = "N/A", 
      strip.white = TRUE, quiet = TRUE))
}

# examples
g_split <- add(g)

G <- c("Drama", "Comedy")
G <- add(G, g)

注意

可重复形式的输入是：

g <- c("Documentary, Biography, Romance", "Short, Thriller", "Documentary", 
  "Drama, Romance", "War, Short", "Documentary, Biography")

更新功能范围之外的集合

1 个答案:

功能

注意