您好我正在使用json,这个json包含几个对话,格式如下: 从括号到括号包含完整的对话如下:
[
{
"created": "2017-02-02T11:57:41+0000",
"from": "Bank",
"message": "Hi Alex, if you have not perform the modification to the data, please verify your DNI, celphone and the operator to verify it. Thanks."
},
{
"created": "2017-02-01T22:19:58+0000" ,
"from": "Alex ",
"message": "Could someone please help me?, I am callig to CC and they don't answer"
},
{
"created": "2017-02-01T22:19:42+0000",
"from": "Alex ",
"message": "the sms with the corresponding key and token has not arrived"
},
{
"created": "2017-02-01T22:19:28+0000",
"from": "Alex ",
"message": "I have issues to make payments from the app"
},
{
"created": "2017-02-01T22:19:18+0000",
"from": "Alex ",
"message": "Good afternoon"
}
],
我想解析这个json,以便将问题放在一个列中,然后将其与为银行提供的anwers匹配,在第二列中,如下所示, 第一次互动将是:
所有用户评论:
"下午好,我有问题从应用程序付款,带有相应密钥和令牌的短信没有到达,有人可以帮助我吗?,我是CC的呼叫,他们不是'回答"
所有答案:
"您好Alex,如果您尚未对数据进行修改,请验证您的DNI,手机和操作员进行验证。感谢"
我想要的输出是解析所有的json来构建这两个列,注意你可以按小时和相应的日期排序,我为了得到这个 我试过了:
with open('/home/adolfo/Desktop/CONVERSATIONS/test2.json') as json_data:
d = json.load(json_data)
df = pd.DataFrame.from_records(np.concatenate(d))
print(df)
然而我得到了:
created from \
0 2017-02-02T11:57:41+0000 Bank
1 2017-02-01T22:19:58+0000 Alex
2 2017-02-01T22:19:42+0000 Alex
3 2017-02-01T22:19:28+0000 Alex
4 2017-02-01T22:19:18+0000 Alex
5 2017-02-02T11:57:41+0000 Bank
6 2017-02-01T22:19:58+0000 Alex
7 2017-02-01T22:19:42+0000 Alex
8 2017-02-01T22:19:28+0000 Alex
9 2017-02-01T22:19:18+0000 Alex
10 2017-02-01T22:19:12+0000 Bank
11 2017-02-01T16:22:30+0000 Alex
message
0 Hi Alex, if you have not perform the modificat...
1 Could someone please help me?, I am callig to ...
2 the sms with the corresponding key and token h...
3 I have issues to make payments from the app
4 Good afternoon
5 Hi Alex, if you have not perform the modificat...
6 Could someone please help me?, I am callig to ...
7 the sms with the corresponding key and token h...
8 I have issues to make payments from the app
9 Good afternoon
10 Hello Alexander, the money is available to be...
11 hello they have deposited the money into my ac...
所以我非常感谢支持实现这个任务,这是json的一个例子:
[
[
{
"created": "2017-02-02T11:57:41+0000",
"from": "Bank",
"message": "Hi Alex, if you have not perform the modification to the data, please verify your DNI, celphone and the operator to verify it. Thanks."
},
{
"created": "2017-02-01T22:19:58+0000" ,
"from": "Alex ",
"message": "Could someone please help me?, I am callig to CC and they don't answer"
},
{
"created": "2017-02-01T22:19:42+0000",
"from": "Alex ",
"message": "the sms with the corresponding key and token has not arrived"
},
{
"created": "2017-02-01T22:19:28+0000",
"from": "Alex ",
"message": "I have issues to make payments from the app"
},
{
"created": "2017-02-01T22:19:18+0000",
"from": "Alex ",
"message": "Good afternoon"
}
],
[
{
"created": "2017-02-01T22:19:12+0000",
"from": "Bank",
"message": " Hello Alexander, the money is available to be withdrawn, you could go to any store the number is 70307002459"
},
{
"created": "2017-02-01T16:22:30+0000",
"from": "Alex",
"message": "hello they have deposited the money into my account, I don't have account from this bank, Could I know if I can withdraw the money? DNI 427 thanks a lot"
}
]
]
在我从这里获得有用的反馈后,我尝试了:
df = pd.read_json('/home/adolfo/Desktop/CONVERSATIONS/test2.json')
df.created = pd.to_datetime(df.created)
df.assign(qna=np.where(df['from'] == 'Bank', 'Answer', 'Question')).set_index(['created', 'qna']).message.unstack(fill_value='')
但我得到了:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-44-8881c5d91cd0> in <module>()
63 df = pd.read_json('/home/adolfo/Desktop/CONVERSATIONS/test2.json')
64
---> 65 df.created = pd.to_datetime(df.created)
66
67 df.assign(qna=np.where(df['from'] == 'Bank', 'Answer', 'Question')).set_index(['created', 'qna']).message.unstack(fill_value='')
/usr/local/lib/python3.5/dist-packages/pandas/core/generic.py in __getattr__(self, name)
2742 if name in self._info_axis:
2743 return self[name]
-> 2744 return object.__getattribute__(self, name)
2745
2746 def __setattr__(self, name, value):
AttributeError: 'DataFrame' object has no attribute 'created'
答案 0 :(得分:1)
j = """[
[
{
"created": "2017-02-02T11:57:41+0000",
"from": "Bank",
"message": "Hi Alex, if you have not perform the modification to the data, please verify your DNI, celphone and the operator to verify it. Thanks."
},
{
"created": "2017-02-01T22:19:58+0000" ,
"from": "Alex ",
"message": "Could someone please help me?, I am callig to CC and they don't answer"
},
{
"created": "2017-02-01T22:19:42+0000",
"from": "Alex ",
"message": "the sms with the corresponding key and token has not arrived"
},
{
"created": "2017-02-01T22:19:28+0000",
"from": "Alex ",
"message": "I have issues to make payments from the app"
},
{
"created": "2017-02-01T22:19:18+0000",
"from": "Alex ",
"message": "Good afternoon"
}
],
[
{
"created": "2017-02-01T22:19:12+0000",
"from": "Bank",
"message": " Hello Alexander, the money is available to be withdrawn, you could go to any store the number is 70307002459"
},
{
"created": "2017-02-01T16:22:30+0000",
"from": "Alex",
"message": "hello they have deposited the money into my account, I don't have account from this bank, Could I know if I can withdraw the money? DNI 427 thanks a lot"
}
]
]"""
js = json.loads(j)
df = pd.concat({i: pd.DataFrame(j) for i, j in enumerate(js)})
df.created = pd.to_datetime(df.created)
df.assign(qna=np.where(df['from'] == 'Bank', 'Answer', 'Question')).set_index(['created', 'qna']).message.unstack(fill_value='')