Question

我想用 Pandas 加载一个 JSON 文件，但它没有像我预期的那样工作！我已经提到了 this stackoverflow 的答案，但我的问题不是那个。 JSON 文件如下所示：

加载文件的代码：-

import pandas as pd
df = pd.read_json("BrowserHistory.json")
print(df)

输出：-

Output Pandas Dataframe

但我不想只包含 1 列包含每个 json 元素。我想要 6 列即 'favicon_url'、'page_transition'、'title'、'url'、'client_id' 和 'time_usec' 如上面 'json 文件' 照片中所述，然后每列应包含它在每个元素中的值。

像这样：

favicon url   page_transition   title   url   client_id   time_user
    .                .            .      .        .           .
    .                .            .      .        .           .
    .                .            .      .        .           .
    .                .            .      .        .           .

JSON 文件：

{
    "Browser History": [
        {
            "favicon_url": "https://www.google.com/favicon.ico",
            "page_transition": "LINK",
            "title": "Google Takeout",
            "url": "https://takeout.google.com/",
            "client_id": "cliendid",
            "time_usec": 1620386529857946
},
        {
            "favicon_url": "https://www.google.com/favicon.ico",
            "page_transition": "LINK",
            "title": "Google Takeout",
            "url": "https://takeout.google.com/",
            "client_id": "cliendid",
            "time_usec": 1620386514845201
},
        {
            "favicon_url": "https://www.google.com/favicon.ico",
            "page_transition": "LINK",
            "title": "Google Takeout",
            "url": "https://takeout.google.com/",
            "client_id": "cliendid",
            "time_usec": 1620386499014063
},
        {
            "favicon_url": "https://ssl.gstatic.com/ui/v1/icons/mail/rfr/gmail.ico",
            "page_transition": "LINK",
            "title": "Gmail",
            "url": "https://mail.google.com/mail/u/0/#inbox",
            "client_id": "cliendid",
            "time_usec": 1620386492788783
}
  ]
}

Answer 1

问题是因为 {} 在您的文件周围，pandas 认为 JSON 的第一级是列，因此它仅使用浏览器历史记录作为列。您可以使用此代码来解决您的问题：

import pandas as pd
df = pd.DataFrame(json.load(open('BrowserHistory.json', encoding='cp850'))['Browser History'])
print(df)

Answer 2

因为您的对象位于 JSON 的第二级下方的列表中，所以您无法使用 read_json 将其直接读入数据帧。相反，您可以将 json 读入一个变量，然后从中创建数据帧：

import pandas as pd
import json

f = open("BrowserHistory.json")
js = json.load(f)
df = pd.DataFrame(js['Browser History'])
df
#                                          favicon_url page_transition  ... client_id         time_usec
# 0                 https://www.google.com/favicon.ico            LINK  ...  cliendid  1620386529857946
# 1                 https://www.google.com/favicon.ico            LINK  ...  cliendid  1620386514845201
# 2                 https://www.google.com/favicon.ico            LINK  ...  cliendid  1620386499014063
# 3  https://ssl.gstatic.com/ui/v1/icons/mail/rfr/g...            LINK  ...  cliendid  1620386492788783

请注意，您可能需要在 open 调用中指定文件编码，例如

f = open("BrowserHistory.json", encoding="utf8")

熊猫“read_json”没有按预期工作

2 个答案: