我正在阅读一个pandas数据框并尝试使用json.dump将其写入文件。 这会引发错误 - TypeError(repr(o)+“不是json serializable” TypeError:5不是JSON可序列化的
(链接到错误屏幕截图http://postimg.org/image/lyr539w5p/
发生错误的代码块如下所示。该块中的第3行引发错误
def write_df_to_json_file(user_array, output_filename):
try:
fh = open(output_filename, "w")
json.dump(user_array, fh)
except IOError:
util.logit(3, "Error: can\'t find file or read data")
else:
util.logit(2, str(output_filename) + " file written successful.")
fh.close()
获取“user_array”的输出我得到以下数据。
user_array = {'344': {'216': 4, '215': 3, '213': 4, '297': 4, '684': 4}, '346': {'216': 3, '215': 3, '213': 3, '669': 1, '211': 4, '218': 3, '219': 2, '133': 5, '132': 4, '496': 5, '693': 4, '210': 4, '22': 5, '29': 4, '161': 3, '358': 4}, '347': {'216': 3, '378': 5, '417': 5, '435': 4}}
使用Anaconda在OS-X上运行的代码运行良好。但在Windows上,它会抛出错误。我团队中的其他三名团队成员在他们的Windows PC上尝试了这些代码,他们都遇到了同样的错误。他们也有Anaconda和Python 2.7(和我一样)。
为了在Windows PC上进行一些故障排除,我们将数据(user_array)复制到一个新文件(在主文件中硬编码而不是从源文件中读取),并尝试了json.dump。代码运行没有任何错误!使用json转储成功创建了该文件。
我已经在线搜索(在stackoverflow以及其他网站上)虽然人们遇到了这个错误,但他们通常有一个无法json转储的字典对象。就我而言,即使这是一个字典对象,它也可以在OS-X上运行。所以我假设OS-X上的对象没有问题。
OS-X和Windows之间是否存在导致此错误的特定区别?如何在Windows上修复此问题?因为,如果他们无法在Windows上运行代码,我的团队无法帮助开发。
---------------其他信息------------
添加最终调用此功能块的块。 read_csv_to_json是第一个被调用的函数。然后轮流调用剩下的(按顺序显示)以write_df_to_json结尾,我得到错误。 有一点,在阶段的早期,我们在从csv读取后在输入文件上做了一个简单的json.dump,它在mac和windows上都运行良好。一周后,当我们拥有所有这些解析功能时,出现了这个错误。不确定这是否是有用的信息
阅读csv文件
def read_csv_to_json(filename, separator, colnames):
df_ratings = read_csv(filename, separator, colnames)
holdout_split(df_ratings)
def read_csv(filename, separator, colnames):
# read the full u data set containing 100000 ratings by 943 users on 1682 items
data_ratings = pd.read_csv(filename, sep =separator, header = None, names = colnames)
return data_ratings
def holdout_split(df_ratings):
train, test = train_test_split(df_ratings, test_size = 0.2)
parse_df_to_usercf_json(train, file_holdout_utrain)
parse_df_to_usercf_json(test, file_holdout_utest)
parse_df_to_itemcf_json(train, file_holdout_itrain)
parse_df_to_itemcf_json(test, file_holdout_itest)
def parse_df_to_usercf_json(data_ratings, output_filename):
data_ratings_sorted = data_ratings.sort_values(user_sort_colname) # sort the data_ratings using the sort_colname
# get distinct_userId sorted by User_id
distinct_userId = sorted(pd.unique(data_ratings_sorted.user_id.ravel()))
# Setting up counters to slice the dataframe for subgroups based on same userid
user_marker = 0
rowCounter = 0;
rowcount = len(data_ratings_sorted.index)
user_array = {}
# Slicing the dataframe based on rows with the same userid and storing as dict objects
for user in distinct_userId: # for each distinct user
movie_details = {}
for index_j, row_j in data_ratings_sorted[user_marker: rowcount].iterrows():
# dataframe is sliced using a user_marker that is set to beginning of the current distinct user
rowCounter += 1 # point rowcounter to next row
if user == row_j['user_id']: # till the userj in current row matches the current distinct user in the top loop
movie_details[str(row_j['movie_id'])]=(row_j['rating']) # create a key value pair {movie id : rating}
else: # if userid in current row doesnt match the distinct user in the top loop
rowCounter = rowCounter - 1 # retreat the rowcounter one step back
user_marker = rowCounter # set user_marker to the next distinct user id
user_array[str(user)] = movie_details # store the userid and movie details as {user_id1: {movie_id,movie_rating}}
break; # skip to next distinct user (back to top loop)
# write the final dictionary object/array to json file
write_df_to_json_file(user_array, output_filename)