编辑JSON文件中的字符串和字典输出

时间:2018-02-02 02:52:52

标签: python json dictionary file-io formatting

我有一个程序,它接收一个JSON文件,逐行读取,根据时间将时间聚合到四个箱中,然后将其输出到文件中。但是,由于将字典与字符串连接,我的文件输出包含额外的字符。

例如,这是一行的输出结果:

dwQEZBFen2GdihLLfWeexA<bound method DataFrame.to_dict of            Friday  Monday  Saturday  Sunday  Thursday  Tuesday  Wednesday
Category                                                                 
Afternoon       0       0         3       2         2        0          1
Evening        20       4        16      11         4        3          5
Night          16       1        19       5         2        5          3>

内存地址也被连接到输出文件中。

以下是用于创建此特定文件的代码:

import json
import ast
import pandas as pd
from datetime import datetime

def cleanStr4SQL(s):
    return s.replace("'","`").replace("\n"," ")

def parseCheckinData():
    #write code to parse yelp_checkin.JSON
    # Add a new column "Time" to the DataFrame and set the values after left padding the values in the index

    with open('yelp_checkin.JSON') as f:
        outfile = open('checkin.txt', 'w')
        line = f.readline()
#        print(line)
        count_line = 0
        while line:
            data = json.loads(line)
#            print(data)
#            jsontxt = cleanStr4SQL(str(data['time']))
            # Parse the json and convert to a dictionary object

            jsondict = ast.literal_eval(str(data))
            outfile.write(cleanStr4SQL(str(data['business_id'])))

            # Convert the "time" element in the dictionary to a pandas DataFrame
            df = pd.DataFrame(jsondict['time'])

            # Add a new column "Time" to the DataFrame and set the values after left padding the values in the index
            df['Time'] = df.index.str.rjust(5, '0')

            # Add a new column "Category" and the set the values based on the time slot
            df['Category'] = df['Time'].apply(cat)

            # Create a pivot table based on the "Category" column
            pt = df.pivot_table(index='Category', aggfunc=sum, fill_value=0)

            # Convert the pivot table to a dictionary to get the json output you want
            jsonoutput = pt.to_dict
#            print(jsonoutput)
            outfile.write(str(jsonoutput))

            line = f.readline()
            count_line+=1
    print(count_line)
    outfile.close()
    f.close()

# Define a function to convert the time slots to the categories
def cat(time_slot):
    if '06:00' <= time_slot < '12:00':
        return 'Morning'
    elif '12:00' <= time_slot < '17:00':
        return 'Afternoon'
    elif '17:00' <= time_slot < '23:00':
        return 'Evening'
    else:
        return 'Night'

我想知道是否有可能以某种方式从输出文件中删除内存位置?

如果您需要更多信息,请联系我们。

感谢您阅读

2 个答案:

答案 0 :(得分:0)

问题1:在to_dict之后缺少括号,这导致了这个&#34;内存地址&#34;。

问题2:要生成有效的JSON,您还需要将输出包装成数组

问题3:使用str或eval将JSON转换为/从字符串转换是不安全的。使用json.loads().dumps()

import json

    ...
    line_chunks = []
    outfile.write("[")
    while line:
        ...
        jsondict = json.loads(data)  # problem 3
        ...
        jsonoutput = pt.to_dict()  # problem 1
        ...
    outfile.write(json.dumps(line_chunks))  # problems 2 and 3

答案 1 :(得分:0)

您使用JSON的方式似乎是流式传输,这是unpleasant problem to deal with

如果您没有使用非常大的JSON文件,那么最好使用

json_data

然后根据需要从with open(...)中提取特定条目(只记得它是一本字典),操纵它们并填充要保存的输出字典

另外,在python中如果你使用{{1}}语法,那么你don't need之后关闭文件