使用purrr :: map()时如何解决'if(is.character(txt)&& length(txt).....

时间:2019-04-16 03:49:55

标签: r json dataframe jsonlite

我一直在使用jsonlite来分解导入的csv文件中的一些嵌套JSON。使用purrr::map()创建数据帧列表并进行相应处理的过程非常成功。在必须手动删除行(如果它们在某些列中不包含任何JSON内容)一段时间之后,我最终使用apply()浏览主表并删除了包含“ []”的所有内容(等效于no在我的文件中输入)。完成此操作后,我将无法使用purrr::map()函数而不会出现错误:

Error in if (is.character(txt) && length(txt) == 1 && nchar(txt, type = 
"bytes") <  : 
  missing value where TRUE/FALSE needed 

purrr::map()仅在我手动删除行时有效,但是当我尝试使用apply删除这些行后分解JSON列时,出现此错误。

奇怪的是,它确实为我工作了一两次,但是当我清除工作区并从头开始运行脚本时,它又停止工作了。我不确定我是如何第一次使用它的。

我已经尝试了很多事情,包括在线搜索错误消息,阅读文档以及尝试通过试验代码进行调试,但到目前为止我还没有碰到运气。我只是从jsonlite开始,所以可能是我所缺少的一些知识,但是我还无法弄清正在发生什么。

下面,我将整个R脚本包括在错误点内,而有问题的部分位于代码的底部。我正在使用的数据集来自kaggle,如果有帮助,可以找到here

library(ridge)
library(glmnet)
library(ggplot2)
library(jtools)
library(readr)
options(scipen = 999)

#import data
movies <- read_csv("tmdb-5000-movie-dataset/tmdb_5000_movies.csv")
credits <- read_csv("tmdb-5000-movie-dataset/tmdb_5000_credits.csv")

#clean data, drop columns with empty "cast" or "crew"
drop.rows <- c(2602, 3662, 3671, 3972, 3978, 3993, 4010, 4069, 4106, 4119, 
4124, 4248, 4294, 4306, 
           4315, 4323, 4386, 4401, 4402, 4406, 4414, 4432, 4459, 4492, 4505, 4509, 4518, 4551, 
           4554, 4563, 4565, 4567, 4570, 4572, 4582, 4582, 4584, 4590, 4612, 4617, 4618, 4623, 
           4634, 4639, 4639, 4645, 4658, 4663, 4675, 4680, 4682, 4686, 4690, 4699, 4711, 4713, 
           4715, 4717, 4738, 4758, 4756, 4798, 4802)
movies <- movies[-c(drop.rows), ]
credits <- credits[-c(drop.rows), ]



#JSON processing
library(jsonlite)

#cast from JSON (credits$cast)
cast <- purrr::map(credits$cast, jsonlite::fromJSON)

#crew from JSON (credits$crew)
crew <- purrr::map(credits$crew, jsonlite::fromJSON)

#create list of "stars"
starring <- vector("character", length(cast))
index <- 1

#create list of "name" of first actor in each movie
for (i in cast) {
  #print(i[1,6])
  starring[[index]] <- i[1,6]
  index <- index + 1
}

#add "starring" column to movies dataframe
movies$starring <- starring





#create list of "directors"
director <- vector("character", length(crew))
index <- 1

#create list of "names" that correspond with rows that contain the "job" 
Director
for (i in crew) {
  director[[index]] <-i[,6][which( i["job"] == "Director" )][1]
  index <- index + 1
}

#add director column to movies dataframe
movies$director <- director




#genre from JSON 
genres <- purrr::map(movies$genres, jsonlite::fromJSON)

genre <- vector("character", length(genres))
index <- 1

for (i in genres) {
  #print(i[1,6])
  genre[[index]] <- i[1,2]
  index <- index + 1
}

movies$genre <- genre




#drop unneccessary columns to make things easier
drops <- c("genres","homepage", "id", "keywords", "overview", "status", 
"tagline", "spoken_languages", "original_title")
movies <- movies[ , !(names(movies) %in% drops)]

#further cleaning, drop rows with empty JSON and rows with budget and 
revenue < 30 (clear errors)
movies <- movies[apply(movies[c(1, 8)],1,function(z) !any(z<30)),]



#####################################################################
#THIS IS WHERE I ATTEMPT TO DROP THE ROWS

movies <- movies[apply(movies[c(4)],1,function(z) !any(z=="[]")),]




#####################################################################


#####################################################################
#I THEN GET THE ERROR MESSAGE HERE

#create production company column from JSON
productionco <- purrr::map(movies$production_companies, jsonlite::fromJSON)

#####################################################################

0 个答案:

没有答案