我是R编程的新手,不幸的是我必须处理movieLens-1M数据。在这里,我想问一下如何在movies.dat中的delimiter [::]处拆分列。我试过这段代码:
> moviesDF<-read.delim("movies.dat", sep="|", header=F, stringsAsFactors=FALSE)
> str(moviesDF)
'data.frame': 3998 obs. of 3 variables:
$ V1: chr "1::Toy Story (1995)::Animation" "2::Jumanji (1995)::Adventure" "3::Grumpier Old Men (1995)::Comedy" "4::Waiting to Exhale (1995)::Comedy" ...
$ V2: chr "Children's" "Children's" "Romance" "Drama" ...
$ V3: chr "Comedy" "Fantasy" "" "" ...
所需的输出如下:
V1: Movie ID
V2: Title
V3: Genre
另外,我的目标是提供推荐系统
答案 0 :(得分:1)
您可以在我的&#34; splitstackshape&#34;中尝试cSplit
包。用法是:
library(splitstackshape)
cSplit(moviesDF, "V1", "::")
# V2 V3 V1_1 V1_2 V1_3
# 1: Children's Comedy 1 Toy Story (1995) Animation
# 2: Children's Fantasy 2 Jumanji (1995) Adventure
# 3: Romance 3 Grumpier Old Men (1995) Comedy
# 4: Drama 4 Waiting to Exhale (1995) Comedy
答案 1 :(得分:1)
问题在于导入功能。 read.delim(sep="|")
未正确读取数据集,因为|
仅限定V3中所需的不同值。您应该使用readLines
导入数据集
moviesDF <- readLines("movies.dat")
moviesDF <- as.data.frame(do.call("rbind",strsplit(moviesDF,"::")),stringsAsFactors = FALSE)
names(moviesDF) <- c("V1","V2","V3")