Question

Scthon和python中的其他一些库开始编写和读取json文件的json行格式：

我尝试使用read_json（...）函数将使用json lines规范的json文件转换为panda数据帧：

我的文件＆＃34; input.json＆＃34;类似于那个，捕获一行：

SELECT l.[Lead ID],
       SUM(IIF(l.[log_actor] <> 'Five9 Dialer', 1, 0)) AS [lead_count_t],
       SUM(IIF(l.[log_actor] = 'Five9 Dialer', 1, 0)) AS [lead_count_f],
       NZ(([lead_count_t] - [lead_count_f]),0) AS [V Calls]
FROM [Logs] l
GROUP BY l.[Lead_ID]

我想要输出的内容：

{"A": {"page": 1, "name": "foo", "url": "xxx"}, "B": {"page": 1, "name": "bar", "url": "http://xxx"}, "C": {"page": 3, "name": "foo", "url": "http://xxx"}}
{"D": {"page": 2, "name": "bar", "url": "xxx"}, "E": {"page": 2, "name": "bar", "url": "http://xxx"}, "F": {"page": 3, "name": "foo", "url": "http://xxx"}}

在第一个意图中，我尝试使用它，但结果不正确：

  page name url
A 1    foo  http://xxx
B 1    bar  http://xxx
C 3    foo  http://xxx
D 2    bar  http://xxx
E 2    bar  http://xxx
F 3    boo  http://xxx

我看到东方=＆＃39;索引＆＃39;在panda doc中使用此规范print( pd.read_json("file:///input.json", orient='index', lines=True))但结果显示我不明白：

{index -> {column -> value}}

Answer 1

您可以考虑使用stack()，reset_index()和apply()的组合来获得您想要的内容。你需要两条线：

df = pd.read_json("file:///input.json", orient='index', lines=True).stack().reset_index(level=1, drop=True)

# Here the .stack() basically flattens your extraneous columns into one.
# .reset_index() is to remove the extra index level that was added by stack()
#
# df
#
# A           {'page': 1, 'name': 'foo', 'url': 'xxx'}
# B    {'page': 1, 'name': 'bar', 'url': 'http://xxx'}
# C    {'page': 3, 'name': 'foo', 'url': 'http://xxx'}
# D           {'page': 2, 'name': 'bar', 'url': 'xxx'}
# E    {'page': 2, 'name': 'bar', 'url': 'http://xxx'}
# F    {'page': 3, 'name': 'foo', 'url': 'http://xxx'}
# dtype: object

df = df.apply(pd.Series, index=df[0].keys())

# Here you use .apply() to extract the dictionary into columns by applying them as a Series.
# the index keyword is to sort it per the keys of first dictionary in the df.
#
# df
#
#        page name         url
#  A        1  foo         xxx
#  B        1  bar  http://xxx
#  C        3  foo  http://xxx
#  D        2  bar         xxx
#  E        2  bar  http://xxx
#  F        3  foo  http://xxx

有点破解，但可以帮助您正确解释jsonlines，而无需经过循环。

Answer 2

当您使用JSON行时，

您需要逐行阅读文件
将每一行转换为字典
从该词典创建数据框
并将其附加到数据框列表
最后，您可以使用pandas concat

和Voila：

import json
line_list = []
with open('sample.json') as f:
    for line in f:
        a_dict = json.loads(line)
        df = pd.DataFrame(a_dict).T
        line_list.append(df)

df = pd.concat(line_list)

，这是所需的输出

    name    page    url
A   foo 1   xxx
B   bar 1   http://xxx
C   foo 3   http://xxx
D   bar 2   xxx
E   bar 2   http://xxx
F   foo 3   http://xxx

使用read_json函数将json行规范的json转换为panda？

2 个答案: