Question

我正在分析有关电影的数据。我想解析和统计类型。

我的数据如下：

data1，data2，comedy | action | adventure，data4，data5

我想算一些流派。我已设法用代码解析项目：

genres <- as.data.frame(table(c(movies["genres"]))) 
# genres look like:
#                                                               Var1 Freq
# 1                                                             Action   11
# 2                                                   Action|Adventure   11
# 3             Action|Adventure|Animation|Comedy|Crime|Family|Fantasy    1
# ...

# as you can see there are items which I need to parse
# for 'debugging' purpose I managed to get
strsplit(toString(genres$Var1[3]), split = "|", fixed = TRUE)
# which results in below output:
# [[1]]
# [1] "Action"    "Adventure" "Animation" "Comedy"    "Crime"     "Family" "Fantasy" 

# My idea is to gather every parsed item into one object, then treated
# that object "as.data.frame" so I could use 'Freq' from data.frame
# please take a look at below code:
genres <- as.data.frame(table(c(movies["genres"])))
list <- c()
i = 1
while(i <= length(genres$Var1)){
    parse <- strsplit(toString(genres$Var1[i]), split = "|", fixed = TRUE)
    merge(list, parse)
    i = i + 1
}

有人可以更好地了解如何完成它，或者我如何以更简单的方式计算类型。提前致谢

Answer 1

这是你要找的？

# split each row on "|"
xx = strsplit(as.character(df$Var1), "|", fixed = TRUE)

# based on the length of 'list of genre' in each row, repeat the corresponding 'Freq'
yy = lapply(1:length(xx), function(x) rep(df$Freq[x], length(xx[[x]])))

df1 = data.frame(genre = unlist(xx), freq = unlist(yy))

library(dplyr)
df1 %>% group_by(genre) %>% summarise(total_freq = sum(freq))
#      genre total_freq
#1    Action         23
#2 Adventure         12
#3 Animation          1
#4    Comedy          1
#5     Crime          1
#6    Family          1
#7   Fantasy          1

# where data df is 
#df
#                                                     Var1 Freq
#1:                                                 Action   11
#2:                                       Action|Adventure   11
#3: Action|Adventure|Animation|Comedy|Crime|Family|Fantasy    1

R：解析，添加到矢量和计数项目

1 个答案: