合并json数据的最佳方法pd dataframe

时间:2019-03-08 18:36:03

标签: python json dataframe

我有多个json文件,该文件保存了来自Requests的响应,像这样,每行/每个列表包含5条记录

def spin():
    SpinValues = [[0,0,0,0,0],[0,0,0,0,0],[0,0,0,0,0]]

    for i, object in enumerate(Reels):
        length = len(Reels[i])
        StopValue = random.randint(0,length)
        SpinValues[i][1] = Reels[i][StopValue]
        if StopValue == 0:
            SpinValues[i][0] = Reels[i][len(Reels[i])]
        else:
            SpinValues[i][0] = Reels[i][StopValue - 1]
        if StopValue == Reels[i][len(Reels[i])]:
            SpinValues[i][2] = Reels[i][0]
        else:
            SpinValues[i][2] = Reels[i][StopValue +1]
    print(SpinValues)

spin()

我应该用resp.content保存它,而返回的resp.content不包含数组还是嵌套在array中的resp.json()?最佳做法是什么?

将它们组合在一起(大约10k的文件)的最佳方法是什么,以便可以将它们放在熊猫数据框中并进行进一步分析?我试着放上它并尝试使用json.load()加载,但是它返回了一个错误:Extra Data

[{"Record1": "1", "Record2": "2", "Record3": "3", "Record4": "4", "Record5": "5"}]

输出:

import json
import codecs
import glob

files = glob.glob('./results/*.json')

with codecs.open('combined_results.json', 'w', encoding='utf-8') as outfile:
    for file in files:
        f = open(file, 'r')
        data = json.load(f)
        json.dump(data, outfile, ensure_ascii=False, indent=None)
        outfile.write("\n")

将合并的文件加载到对象中:(错误:额外数据)

[{"Record1": "1", "Record2": "2", "Record3": "3", "Record4": "4", "Record5": "5"}]
[{"Record1": "1", "Record2": "2", "Record3": "3", "Record4": "4", "Record5": "5"}]
[{"Record1": "1", "Record2": "2", "Record3": "3", "Record4": "4", "Record5": "5"}]

2 个答案:

答案 0 :(得分:2)

您可以更改代码以将文件合并为有效的json对象:

combined_results = []
with open('combined_results.json', 'w', encoding='utf-8') as outfile:
    for file in files:
        f = open(file, 'r')
        combined_results.append(json.load(f)[0])
    json.dump(combined_results, outfile)

现在要在数据框中读取此文件,请尝试pd.read_json

pd.read_json('combined_results.json')

更新:

您实际上根本不需要combined_results.json文件。除非您希望将文件合并为一个以后要使用的单个文件,否则可以将combined_results的列表直接转换为数据框。

combined_results = []
for file in files:
    f = open(file, 'r')
    combined_results.append(json.load(f)[0])

pd.DataFrame(combined_results)

答案 1 :(得分:0)

尝试function truncateString(yourString, maxLength) { while (maxLength < yourString.length && yourString[maxLength] != ' '){ maxLength++; } return yourString.substr(0, maxLength); } console.log( truncateString('The quick brown fox jumps over the lazy dog',6) )