您好我有以下json:
j = """[
[
{
"created": "2017-02-02T11:57:41+0000",
"from": "Bank",
"message": "Hi Alex, if you have not perform the modification to the data, please verify your DNI, celphone and the operator to verify it. Thanks."
},
{
"created": "2017-02-01T22:19:58+0000" ,
"from": "Alex ",
"message": "Could someone please help me?, I am callig to CC and they don't answer"
},
{
"created": "2017-02-01T22:19:42+0000",
"from": "Alex ",
"message": "the sms with the corresponding key and token has not arrived"
},
{
"created": "2017-02-01T22:19:28+0000",
"from": "Alex ",
"message": "I have issues to make payments from the app"
},
{
"created": "2017-02-01T22:19:18+0000",
"from": "Alex ",
"message": "Good afternoon"
}
],
[
{
"created": "2017-02-01T22:19:12+0000",
"from": "Bank",
"message": " Hello Alexander, the money is available to be withdrawn, you could go to any store the number is 70307002459"
},
{
"created": "2017-02-01T16:22:30+0000",
"from": "Alex",
"message": "hello they have deposited the money into my account, I don't have account from this bank, Could I know if I can withdraw the money? DNI 427 thanks a lot"
}
]
]"""
由于我需要一个特定的结构,我试着按如下方式解析它:
js = json.loads(j)
df = pd.concat({i: pd.DataFrame(j) for i, j in enumerate(js)})
df.created = pd.to_datetime(df.created)
df.assign(qna=np.where(df['from'] == 'Bank', 'Answer', 'Question')).set_index(['created', 'qna']).message.unstack(fill_value='')
在此之前一切正常,但如果我添加另一个重复日期的字段,我会收到以下错误:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-5-5652e92adbdc> in <module>()
69 df['from'] = df['from'].str.strip()
70 df = df.drop_duplicates()
---> 71 df.assign(qna=np.where(df['from'] == 'Bank', 'Answer', 'Question')) .set_index(['created', 'qna']) .unstack()
72
73
/usr/local/lib/python3.5/dist-packages/pandas/core/frame.py in unstack(self, level, fill_value)
4034 """
4035 from pandas.core.reshape import unstack
-> 4036 return unstack(self, level, fill_value)
4037
4038 # ----------------------------------------------------------------------
/usr/local/lib/python3.5/dist-packages/pandas/core/reshape.py in unstack(obj, level, fill_value)
406 if isinstance(obj, DataFrame):
407 if isinstance(obj.index, MultiIndex):
--> 408 return _unstack_frame(obj, level, fill_value=fill_value)
409 else:
410 return obj.T.stack(dropna=False)
/usr/local/lib/python3.5/dist-packages/pandas/core/reshape.py in _unstack_frame(obj, level, fill_value)
449 unstacker = _Unstacker(obj.values, obj.index, level=level,
450 value_columns=obj.columns,
--> 451 fill_value=fill_value)
452 return unstacker.get_result()
453
/usr/local/lib/python3.5/dist-packages/pandas/core/reshape.py in __init__(self, values, index, level, value_columns, fill_value)
101
102 self._make_sorted_values_labels()
--> 103 self._make_selectors()
104
105 def _make_sorted_values_labels(self):
/usr/local/lib/python3.5/dist-packages/pandas/core/reshape.py in _make_selectors(self)
139
140 if mask.sum() < len(self.index):
--> 141 raise ValueError('Index contains duplicate entries, '
142 'cannot reshape')
143
ValueError: Index contains duplicate entries, cannot reshape
我正在尝试使用这个新的json,但它在日期失败了,所以我希望得到支持来克服这个任务:
这是失败的json:
j = """[
[
{
"created": "2017-02-02T11:57:41+0000",
"from": "Bank",
"message": "Hi Alex, if you have not perform the modification to the data, please verify your DNI, celphone and the operator to verify it. Thanks."
},
{
"created": "2017-02-01T22:19:58+0000" ,
"from": "Alex ",
"message": "Could someone please help me?, I am callig to CC and they don't answer"
},
{
"created": "2017-02-01T22:19:42+0000",
"from": "Alex ",
"message": "the sms with the corresponding key and token has not arrived"
},
{
"created": "2017-02-01T22:19:28+0000",
"from": "Alex ",
"message": "I have issues to make payments from the app"
},
{
"created": "2017-02-01T22:19:18+0000",
"from": "Alex ",
"message": "Good afternoon"
}
],
[
{
"created": "2017-02-01T22:19:12+0000",
"from": "Bank",
"message": " Hello Alexander, the money is available to be withdrawn, you could go to any store the number is 70307002459"
},
{
"created": "2017-02-01T16:22:30+0000",
"from": "Alex",
"message": "hello they have deposited the money into my account, I don't have account from this bank, Could I know if I can withdraw the money? DNI 427 thanks a lot"
}
],
[
{
"created": "2017-02-01T22:19:13+0000",
"from": "Bank",
"message": " Hello Adolfo, the money is available."
},
{
"created": "2017-02-01T16:22:33+0000",
"from": "Omar",
"message": "hello they have deposited the money into my account."
}
]
]"""
答案 0 :(得分:1)
看起来您需要将assign
声明分开。无需append=True
。
js = json.loads(j)
df = pd.concat([pd.DataFrame(j) for j in js], ignore_index=True)
df['from'] = df['from'].str.strip()
df['created'] = pd.to_datetime(df.created)
df['qna'] = np.where(df['from'] == 'Bank', 'Answer', 'Question')
df1 = df.set_index(['created', 'qna']).unstack(fill_value='')
with pd.option_context('display.max_colwidth', 30, 'display.expand_frame_repr', False):
print(df1)
输出
from message
qna Answer Question Answer Question
created
2017-02-01 16:22:30 Alex hello they have deposited ...
2017-02-01 22:19:12 Bank Hello Alexander, the mone...
2017-02-01 22:19:18 Alex Good afternoon
2017-02-01 22:19:28 Alex I have issues to make paym...
2017-02-01 22:19:42 Alex the sms with the correspon...
2017-02-01 22:19:58 Alex Could someone please help ...
2017-02-02 11:57:41 Bank Hi Alex, if you have not p...