我想将json格式转换为pandas df。 json的样本如下:
{'asin': '0615208479', 'description': "By now we all know the benefits of exercise for the body. It's the only real fountain of youth! The same is true for the brain. Take your brain to the gym several times a week and you can improve, regain and prevent memory loss. Discover the world of brain fitness through BrainAerboics.\nThe program was designed by a medical team and is backed with mounting research proving it works. It is believed to be the only one that combines the three crucial elements required for optimal brain fitness.", 'title': 'Brain Fitness Exercises Software', 'imUrl': 'http://ecx.images-amazon.com/images/I/41kbZB047NL._SY300_.jpg', 'salesRank': {'Health & Personal Care': 1346973}, 'categories': [['Health & Personal Care', 'Personal Care']]}
我试过了:
df = pd.read_json('test.json',lines=True)
这不起作用,因为我的json这里有单引号,这不是标准的json格式。
所以我也试过简单的shell脚本将所有单引号转换为double:
cat test.json|sed "s/'/\"/g"
这也行不通,因为json包含的评论文本包含"It's the only real"
;所以我们不能粗暴地将所有单引号转换成双引号。
然后我试着考虑直接使用单引号进行转换:
with open ('test.json') as f:
s = f.read()
print(ast.literal_eval(s))
但我收到了错误:
SyntaxError: invalid syntax
在'categories': [['Health & Personal Care', 'Personal Care']]
答案 0 :(得分:3)
此问题不应在其标题或标记中的任何位置使用“JSON”,因为此数据不是JSON 。
那就是说,ast.literal_eval()
可以正常工作,如果你逃避换行文字。
s='''{'asin': '0615208479', 'description': "By now we all know the benefits of exercise for the body. It's the only real fountain of youth! The same is true for the brain. Take your brain to the gym several times a week and you can improve, regain and prevent memory loss. Discover the world of brain fitness through BrainAerboics.\nThe program was designed by a medical team and is backed with mounting research proving it works. It is believed to be the only one that combines the three crucial elements required for optimal brain fitness.", 'title': 'Brain Fitness Exercises Software', 'imUrl': 'http://ecx.images-amazon.com/images/I/41kbZB047NL._SY300_.jpg', 'salesRank': {'Health & Personal Care': 1346973}, 'categories': [['Health & Personal Care', 'Personal Care']]}'''
import ast
ast.literal_eval(s.replace('\n', '\\n'))