我有一个pandas数据帧(raw csv file here),其中包含一些存储为json(d1& d2)的列。如何解析这些列以获得所需的输出:
2015-02-12,user1,05:15 | 20,16:30 | 20.0,22:00 | 10.0
我意识到我必须在成功解析之后转置输出,但是我在读取dataframe列中包含的json数据时遇到了问题。任何帮助赞赏!感谢
>>> test = pd.read_csv('schedsample.csv',sep=',', header=0)
>>> test.head()
date username d1 \
0 2015-02-12 user1 {"d1":[{"tm":"05:15","t":"20.0"},{"tm":"16:30"...
1 2015-02-12 user1 {"d2":[{"tm":"06:15","t":"20.0"},{"tm":"08:00"...
2 2015-02-12 user1 {"d3":[{"tm":"07:15","t":"20.0"},{"tm":"09:00"...
3 2015-02-12 user1 {"d4":[{"tm":"08:15","t":"20.0"},{"tm":"07:00"...
d2
0 {"d1":[{"tm":"05:15","t":"20.0"},{"tm":"16:30"...
1 {"d1":[{"tm":"05:15","t":"20.0"},{"tm":"16:30"...
2 {"d1":[{"tm":"05:15","t":"20.0"},{"tm":"16:30"...
3 {"d1":[{"tm":"05:15","t":"20.0"},{"tm":"16:30"...
>>> import json as js
>>> js.loads(test['d1'])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/khurampervez/anaconda/lib/python2.7/json/__init__.py", line 338, in loads
return _default_decoder.decode(s)
File "/Users/khurampervez/anaconda/lib/python2.7/json/decoder.py", line 366, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
TypeError: expected string or buffer`
答案 0 :(得分:0)
您的test.d1列包含所有d1到d4对象,因此如果您执行json.loads(test['d1'])
会导致错误,但如果您执行json_normalize(json.loads(test['d1'][0])['d1'])
,则会为您提供所需的d1数据帧。所以我想而不是只读入d1和d2列,你需要d3和d4列,这将产生一些空单元格。