Scthon和python中的其他一些库开始编写和读取json文件的json行格式:
我尝试使用read_json(...)函数将使用json lines规范的json文件转换为panda数据帧:
我的文件" input.json"类似于那个,捕获一行:
SELECT l.[Lead ID],
SUM(IIF(l.[log_actor] <> 'Five9 Dialer', 1, 0)) AS [lead_count_t],
SUM(IIF(l.[log_actor] = 'Five9 Dialer', 1, 0)) AS [lead_count_f],
NZ(([lead_count_t] - [lead_count_f]),0) AS [V Calls]
FROM [Logs] l
GROUP BY l.[Lead_ID]
我想要输出的内容:
{"A": {"page": 1, "name": "foo", "url": "xxx"}, "B": {"page": 1, "name": "bar", "url": "http://xxx"}, "C": {"page": 3, "name": "foo", "url": "http://xxx"}}
{"D": {"page": 2, "name": "bar", "url": "xxx"}, "E": {"page": 2, "name": "bar", "url": "http://xxx"}, "F": {"page": 3, "name": "foo", "url": "http://xxx"}}
在第一个意图中,我尝试使用它,但结果不正确:
page name url
A 1 foo http://xxx
B 1 bar http://xxx
C 3 foo http://xxx
D 2 bar http://xxx
E 2 bar http://xxx
F 3 boo http://xxx
我看到东方=&#39;索引&#39;在panda doc中使用此规范print( pd.read_json("file:///input.json", orient='index', lines=True))
但结果显示我不明白:
{index -> {column -> value}}
答案 0 :(得分:4)
您可以考虑使用stack()
,reset_index()
和apply()
的组合来获得您想要的内容。你需要两条线:
df = pd.read_json("file:///input.json", orient='index', lines=True).stack().reset_index(level=1, drop=True)
# Here the .stack() basically flattens your extraneous columns into one.
# .reset_index() is to remove the extra index level that was added by stack()
#
# df
#
# A {'page': 1, 'name': 'foo', 'url': 'xxx'}
# B {'page': 1, 'name': 'bar', 'url': 'http://xxx'}
# C {'page': 3, 'name': 'foo', 'url': 'http://xxx'}
# D {'page': 2, 'name': 'bar', 'url': 'xxx'}
# E {'page': 2, 'name': 'bar', 'url': 'http://xxx'}
# F {'page': 3, 'name': 'foo', 'url': 'http://xxx'}
# dtype: object
df = df.apply(pd.Series, index=df[0].keys())
# Here you use .apply() to extract the dictionary into columns by applying them as a Series.
# the index keyword is to sort it per the keys of first dictionary in the df.
#
# df
#
# page name url
# A 1 foo xxx
# B 1 bar http://xxx
# C 3 foo http://xxx
# D 2 bar xxx
# E 2 bar http://xxx
# F 3 foo http://xxx
有点破解,但可以帮助您正确解释jsonlines,而无需经过循环。
答案 1 :(得分:2)
当您使用JSON行时,
和Voila:
import json
line_list = []
with open('sample.json') as f:
for line in f:
a_dict = json.loads(line)
df = pd.DataFrame(a_dict).T
line_list.append(df)
df = pd.concat(line_list)
,这是所需的输出
name page url
A foo 1 xxx
B bar 1 http://xxx
C foo 3 http://xxx
D bar 2 xxx
E bar 2 http://xxx
F foo 3 http://xxx