我一直在使用jsonlite来分解导入的csv文件中的一些嵌套JSON。使用purrr::map()
创建数据帧列表并进行相应处理的过程非常成功。在必须手动删除行(如果它们在某些列中不包含任何JSON内容)一段时间之后,我最终使用apply()
浏览主表并删除了包含“ []”的所有内容(等效于no在我的文件中输入)。完成此操作后,我将无法使用purrr::map()
函数而不会出现错误:
Error in if (is.character(txt) && length(txt) == 1 && nchar(txt, type =
"bytes") < :
missing value where TRUE/FALSE needed
purrr::map()
仅在我手动删除行时有效,但是当我尝试使用apply
删除这些行后分解JSON列时,出现此错误。
奇怪的是,它确实为我工作了一两次,但是当我清除工作区并从头开始运行脚本时,它又停止工作了。我不确定我是如何第一次使用它的。
我已经尝试了很多事情,包括在线搜索错误消息,阅读文档以及尝试通过试验代码进行调试,但到目前为止我还没有碰到运气。我只是从jsonlite开始,所以可能是我所缺少的一些知识,但是我还无法弄清正在发生什么。
下面,我将整个R脚本包括在错误点内,而有问题的部分位于代码的底部。我正在使用的数据集来自kaggle,如果有帮助,可以找到here。
library(ridge)
library(glmnet)
library(ggplot2)
library(jtools)
library(readr)
options(scipen = 999)
#import data
movies <- read_csv("tmdb-5000-movie-dataset/tmdb_5000_movies.csv")
credits <- read_csv("tmdb-5000-movie-dataset/tmdb_5000_credits.csv")
#clean data, drop columns with empty "cast" or "crew"
drop.rows <- c(2602, 3662, 3671, 3972, 3978, 3993, 4010, 4069, 4106, 4119,
4124, 4248, 4294, 4306,
4315, 4323, 4386, 4401, 4402, 4406, 4414, 4432, 4459, 4492, 4505, 4509, 4518, 4551,
4554, 4563, 4565, 4567, 4570, 4572, 4582, 4582, 4584, 4590, 4612, 4617, 4618, 4623,
4634, 4639, 4639, 4645, 4658, 4663, 4675, 4680, 4682, 4686, 4690, 4699, 4711, 4713,
4715, 4717, 4738, 4758, 4756, 4798, 4802)
movies <- movies[-c(drop.rows), ]
credits <- credits[-c(drop.rows), ]
#JSON processing
library(jsonlite)
#cast from JSON (credits$cast)
cast <- purrr::map(credits$cast, jsonlite::fromJSON)
#crew from JSON (credits$crew)
crew <- purrr::map(credits$crew, jsonlite::fromJSON)
#create list of "stars"
starring <- vector("character", length(cast))
index <- 1
#create list of "name" of first actor in each movie
for (i in cast) {
#print(i[1,6])
starring[[index]] <- i[1,6]
index <- index + 1
}
#add "starring" column to movies dataframe
movies$starring <- starring
#create list of "directors"
director <- vector("character", length(crew))
index <- 1
#create list of "names" that correspond with rows that contain the "job"
Director
for (i in crew) {
director[[index]] <-i[,6][which( i["job"] == "Director" )][1]
index <- index + 1
}
#add director column to movies dataframe
movies$director <- director
#genre from JSON
genres <- purrr::map(movies$genres, jsonlite::fromJSON)
genre <- vector("character", length(genres))
index <- 1
for (i in genres) {
#print(i[1,6])
genre[[index]] <- i[1,2]
index <- index + 1
}
movies$genre <- genre
#drop unneccessary columns to make things easier
drops <- c("genres","homepage", "id", "keywords", "overview", "status",
"tagline", "spoken_languages", "original_title")
movies <- movies[ , !(names(movies) %in% drops)]
#further cleaning, drop rows with empty JSON and rows with budget and
revenue < 30 (clear errors)
movies <- movies[apply(movies[c(1, 8)],1,function(z) !any(z<30)),]
#####################################################################
#THIS IS WHERE I ATTEMPT TO DROP THE ROWS
movies <- movies[apply(movies[c(4)],1,function(z) !any(z=="[]")),]
#####################################################################
#####################################################################
#I THEN GET THE ERROR MESSAGE HERE
#create production company column from JSON
productionco <- purrr::map(movies$production_companies, jsonlite::fromJSON)
#####################################################################