Question

我正在阅读一个pandas数据框并尝试使用json.dump将其写入文件。这会引发错误 - TypeError（repr（o）+“不是json serializable” TypeError：5不是JSON可序列化的

（链接到错误屏幕截图http://postimg.org/image/lyr539w5p/

发生错误的代码块如下所示。该块中的第3行引发错误

def write_df_to_json_file(user_array, output_filename):
try:
    fh = open(output_filename, "w")
    json.dump(user_array, fh)
except IOError:
    util.logit(3, "Error: can\'t find file or read data")
else:
    util.logit(2, str(output_filename) + " file written successful.")
    fh.close()

获取“user_array”的输出我得到以下数据。

user_array = {'344': {'216': 4, '215': 3, '213': 4, '297': 4, '684': 4}, '346': {'216': 3, '215': 3, '213': 3, '669': 1, '211': 4, '218': 3, '219': 2, '133': 5, '132': 4, '496': 5, '693': 4, '210': 4, '22': 5, '29': 4, '161': 3, '358': 4}, '347': {'216': 3, '378': 5, '417': 5, '435': 4}}

使用Anaconda在OS-X上运行的代码运行良好。但在Windows上，它会抛出错误。我团队中的其他三名团队成员在他们的Windows PC上尝试了这些代码，他们都遇到了同样的错误。他们也有Anaconda和Python 2.7（和我一样）。

为了在Windows PC上进行一些故障排除，我们将数据（user_array）复制到一个新文件（在主文件中硬编码而不是从源文件中读取），并尝试了json.dump。代码运行没有任何错误！使用json转储成功创建了该文件。

我已经在线搜索（在stackoverflow以及其他网站上）虽然人们遇到了这个错误，但他们通常有一个无法json转储的字典对象。就我而言，即使这是一个字典对象，它也可以在OS-X上运行。所以我假设OS-X上的对象没有问题。

OS-X和Windows之间是否存在导致此错误的特定区别？如何在Windows上修复此问题？因为，如果他们无法在Windows上运行代码，我的团队无法帮助开发。

---------------其他信息------------

添加最终调用此功能块的块。 read_csv_to_json是第一个被调用的函数。然后轮流调用剩下的（按顺序显示）以write_df_to_json结尾，我得到错误。有一点，在阶段的早期，我们在从csv读取后在输入文件上做了一个简单的json.dump，它在mac和windows上都运行良好。一周后，当我们拥有所有这些解析功能时，出现了这个错误。不确定这是否是有用的信息

阅读csv文件

def read_csv_to_json(filename, separator, colnames):
  df_ratings = read_csv(filename, separator, colnames)

  holdout_split(df_ratings)


def read_csv(filename, separator, colnames):
  # read the full u data set containing 100000 ratings by 943 users on 1682 items
  data_ratings = pd.read_csv(filename, sep =separator, header = None, names = colnames)
  return data_ratings

def holdout_split(df_ratings):
  train, test = train_test_split(df_ratings, test_size = 0.2)

  parse_df_to_usercf_json(train, file_holdout_utrain)
  parse_df_to_usercf_json(test, file_holdout_utest)

  parse_df_to_itemcf_json(train, file_holdout_itrain)
  parse_df_to_itemcf_json(test, file_holdout_itest)

def parse_df_to_usercf_json(data_ratings, output_filename):
  data_ratings_sorted = data_ratings.sort_values(user_sort_colname)   # sort the data_ratings using the sort_colname
  # get distinct_userId sorted by User_id
  distinct_userId = sorted(pd.unique(data_ratings_sorted.user_id.ravel()))
  # Setting up counters to slice the dataframe for subgroups based on same userid
  user_marker = 0
  rowCounter = 0;
  rowcount = len(data_ratings_sorted.index)
  user_array = {}
  # Slicing the dataframe based on rows with the same userid and storing as dict objects
  for user in distinct_userId:  # for each distinct user
    movie_details = {}  
    for index_j, row_j in data_ratings_sorted[user_marker: rowcount].iterrows():  
                # dataframe is sliced using a user_marker that is set to beginning of the current distinct user
        rowCounter += 1  #  point rowcounter to next row
        if user == row_j['user_id']:  # till the userj in current row matches the current distinct user in the top loop
            movie_details[str(row_j['movie_id'])]=(row_j['rating'])  # create a key value pair {movie id : rating}
        else:  # if userid in current row doesnt match the distinct user in the top loop
            rowCounter = rowCounter - 1  # retreat the rowcounter one step back
            user_marker = rowCounter  # set user_marker to the next distinct user id
            user_array[str(user)] = movie_details  # store the userid and movie details as {user_id1: {movie_id,movie_rating}}
            break;  # skip to next distinct user (back to top loop)
# write the final dictionary object/array to json file
write_df_to_json_file(user_array, output_filename)

json.dump（dict，f）出错 - TypeError（repr（o）+“不是json serializable”）

0 个答案: