从R中的表中过滤记录

时间:2016-04-19 07:25:25

标签: r jupyter-irkernel

我有一个数据集电影。

head(Movies)

output of head(Movies)

如何获取MovieID所在的行" 0000008"? 我试过了:

t1 = subset(Movies, "MovieID" == "0000008")
t2 <- Movies[ which(Movies["MovieID"]=="0000008"), ]
head(t1)
head(t2)

两者都返回空数据集,这是错误的,因为我可以看到ID为&#34; 0000008&#34;的行。

编辑: 我试过删除&#34;&#34;来自MovieID,但是会引发错误:

  

subset.matrix中的错误(电影,MovieID ==&#34; 0000008&#34;):object&#39; MovieID&#39;找不到

编辑: 电影数据获得如下:

URL = "https://raw.githubusercontent.com/sidooms/MovieTweetings/master/latest/movies.dat"
MovieText = readLines( remote.file(URL) ) # HACK!!!
Movies = matrix( sapply( MovieText,
            function(x) unlist(strsplit(sub(" [(]([0-9]+)[)]", "::\\1",x),"::"))[1:4] ),
            nrow=length(MovieText), ncol=4, byrow=TRUE )
colnames(Movies) = c("MovieID", "MovieTitle", "Year", "Genres")

1 个答案:

答案 0 :(得分:2)

您的nrow应为length(MovieText)/4

URL = "https://raw.githubusercontent.com/sidooms/MovieTweetings/master/latest/movies.dat"
MovieText = readLines( URL ) # HACK!!!
Movies = matrix( sapply( MovieText,
    function(x) unlist(strsplit(sub(" [(]([0-9]+)[)]", "::\\1",x),"::"))[1:4] ),
    nrow=length(MovieText)/4, ncol=4, byrow=TRUE )
colnames(Movies) = c("MovieID", "MovieTitle", "Year", "Genres")

#if you want to work with matrix, then use this
subset(Movies, Movies[,"MovieID"]=="0000008")

修改:data.framedata.table子集

library(data.table)

MoviesDF <- data.frame(Movies)
MoviesDT <- data.table(Movies)

MoviesDF[MoviesDF["MovieID"] == "0000008", ]
MoviesDT[MovieID == "0000008", ]

BTW:喜欢 HACK !!! 评论。