根据平台(使用Python)拆分数据(来自文本/ json文件)

时间:2018-08-31 08:58:03

标签: python json split

这是示例数据(JSON文件)  -用完全相同的行填充JSON文件,因为此“ JSON”文件已准备好上载到Big Query,以寻找按平台拆分它的方式。

{"origin": {"detailed": "instagram", "source": "instagram", "platform": "instagram"}.....}
{"origin": {"detailed": "website", "source": "website", "platform": "website"}.....}
{"origin": {"detailed": "forum", "source": "forum", "platform": "forum"}.....}
{"origin": {"detailed": "twitter", "source": "twitter", "platform": "twitter"}.....}
{"origin": {"detailed": "facebook", "source": "facebook", "platform": "facebook"}.....}

我正在根据平台将这些数据拆分为不同的文本文件。

if platform = instagram ( but some how it should be - if line contain "platform": "instagram" )
    write to post_instagram.json
if platform = facebook
    write to post_facebook.json
..............
    ...................

什么是干净的方法? -通过使用PYTHON

示例:

with open(FILE_NAME, "r") as infile:
    Data = infileFollow.read()
    If statements?
    while statement?
    .....

with open(FILE_NAME, "w+") as outfile:
    outfile.write(Data)

原因: 我希望将数据吐出,因为无法创建将接受不同平台的单一架构,因为即使我为所有平台创建具有所有列的架构,不同的平台也会有额外的重复列破坏一致性。因此,由于解决方案需要基于平台拆分数据,因此每个平台的数据架构都不同。

3 个答案:

答案 0 :(得分:1)

也许像这样:

import json 

json.dump([x for x in data if "instagram" in x["origin"]["platform"]], open("post_instagram.json", "w"))

json.dump([x for x in data if "facebook" in x["origin"]["platform"]], open("post_facebook.json", "w"))

# other platforms ...

如果数据非常庞大,而不是迭代每个“平台”的所有数据:

instagram = []
facebook = []

for d in data:
    if "instagram" in d["origin"]["platform"]:
        instagram.append(d)
    elif "facebook" in d["origin"]["platform"]:
        facebook.append(d)

json.dump(instagram, open("post_instagram.json", "w"))
json.dump(facebook, open("post_facebook.json", "w"))

答案 1 :(得分:0)

您可以将json模块用于python。

然后,您可以json.load的文件并获得字典作为输出,然后遍历your_dict['origin']['platform']以将行写到名为'post_'+platform+'.json'的文件中

答案 2 :(得分:0)

您可以使用JSON模块。

例如:

import json
from collections import defaultdict

with open(filename) as infile:
    data = json.load(infile)     #Read JSON


res = defaultdict(list)
for i in data["data"]:
    res[i["origin"]["platform"]].append(i)

for k,v in res.items():
    with open("post_{}.json".format(k), "w") as outfile:   #Open Required file for Write.
        json.dump(v, outfile)

此示例中的Json示例。

{
    "data": [
        {"origin": {"detailed": "instagram", "source": "instagram", "platform": "instagram"}},
        {"origin": {"detailed": "website", "source": "website", "platform": "website"}},
        {"origin": {"detailed": "forum", "source": "forum", "platform": "forum"}},
        {"origin": {"detailed": "twitter", "source": "twitter", "platform": "twitter"}},
        {"origin": {"detailed": "facebook", "source": "facebook", "platform": "facebook"}}
    ]
}