如何克服,以下问题解析json文件?

时间:2017-03-22 19:15:40

标签: json pandas

您好我有以下json:

   j = """[
    [
        {
            "created": "2017-02-02T11:57:41+0000",
            "from": "Bank",
            "message": "Hi Alex, if you have not perform the modification to the data, please verify your DNI, celphone and the operator to verify it. Thanks."
        },
        {
            "created": "2017-02-01T22:19:58+0000"   ,
            "from": "Alex ",
            "message": "Could someone please help me?, I am callig to CC and they don't answer"
        },
        {
            "created": "2017-02-01T22:19:42+0000",
            "from": "Alex ",
            "message": "the sms with the corresponding key and token has not arrived"
        },
        {
            "created": "2017-02-01T22:19:28+0000",
            "from": "Alex ",
            "message": "I have issues to make payments from the app"
        },
        {
            "created": "2017-02-01T22:19:18+0000",
            "from": "Alex ",
            "message": "Good afternoon"
        }
    ],
    [
        {
            "created": "2017-02-01T22:19:12+0000",
            "from": "Bank",
            "message": " Hello Alexander, the money is available to be  withdrawn, you could go to any store the number is 70307002459"
        }, 
        {            
            "created": "2017-02-01T16:22:30+0000",
            "from": "Alex",
            "message": "hello they have deposited the money into my account, I don't have account from this bank, Could I know if I can withdraw the money? DNI 427 thanks a lot"
        }

    ]


]"""

由于我需要一个特定的结构,我试着按如下方式解析它:

js = json.loads(j)
df = pd.concat({i: pd.DataFrame(j) for i, j in enumerate(js)})

df.created = pd.to_datetime(df.created)

df.assign(qna=np.where(df['from'] == 'Bank', 'Answer', 'Question')).set_index(['created', 'qna']).message.unstack(fill_value='')

在此之前一切正常,但如果我添加另一个重复日期的字段,我会收到以下错误:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-5-5652e92adbdc> in <module>()
     69 df['from'] = df['from'].str.strip()
     70 df = df.drop_duplicates()
---> 71 df.assign(qna=np.where(df['from'] == 'Bank', 'Answer', 'Question'))  .set_index(['created', 'qna'])  .unstack()
     72 
     73 

/usr/local/lib/python3.5/dist-packages/pandas/core/frame.py in unstack(self, level, fill_value)
   4034         """
   4035         from pandas.core.reshape import unstack
-> 4036         return unstack(self, level, fill_value)
   4037 
   4038     # ----------------------------------------------------------------------

/usr/local/lib/python3.5/dist-packages/pandas/core/reshape.py in unstack(obj, level, fill_value)
    406     if isinstance(obj, DataFrame):
    407         if isinstance(obj.index, MultiIndex):
--> 408             return _unstack_frame(obj, level, fill_value=fill_value)
    409         else:
    410             return obj.T.stack(dropna=False)

/usr/local/lib/python3.5/dist-packages/pandas/core/reshape.py in _unstack_frame(obj, level, fill_value)
    449         unstacker = _Unstacker(obj.values, obj.index, level=level,
    450                                value_columns=obj.columns,
--> 451                                fill_value=fill_value)
    452         return unstacker.get_result()
    453 

/usr/local/lib/python3.5/dist-packages/pandas/core/reshape.py in __init__(self, values, index, level, value_columns, fill_value)
    101 
    102         self._make_sorted_values_labels()
--> 103         self._make_selectors()
    104 
    105     def _make_sorted_values_labels(self):

/usr/local/lib/python3.5/dist-packages/pandas/core/reshape.py in _make_selectors(self)
    139 
    140         if mask.sum() < len(self.index):
--> 141             raise ValueError('Index contains duplicate entries, '
    142                              'cannot reshape')
    143 

ValueError: Index contains duplicate entries, cannot reshape

我正在尝试使用这个新的json,但它在日期失败了,所以我希望得到支持来克服这个任务:

这是失败的json:

j = """[
    [
        {
            "created": "2017-02-02T11:57:41+0000",
            "from": "Bank",
            "message": "Hi Alex, if you have not perform the modification to the data, please verify your DNI, celphone and the operator to verify it. Thanks."
        },
        {
            "created": "2017-02-01T22:19:58+0000"   ,
            "from": "Alex ",
            "message": "Could someone please help me?, I am callig to CC and they don't answer"
        },
        {
            "created": "2017-02-01T22:19:42+0000",
            "from": "Alex ",
            "message": "the sms with the corresponding key and token has not arrived"
        },
        {
            "created": "2017-02-01T22:19:28+0000",
            "from": "Alex ",
            "message": "I have issues to make payments from the app"
        },
        {
            "created": "2017-02-01T22:19:18+0000",
            "from": "Alex ",
            "message": "Good afternoon"
        }
    ],
    [
        {
            "created": "2017-02-01T22:19:12+0000",
            "from": "Bank",
            "message": " Hello Alexander, the money is available to be  withdrawn, you could go to any store the number is 70307002459"
        }, 
        {            
            "created": "2017-02-01T16:22:30+0000",
            "from": "Alex",
            "message": "hello they have deposited the money into my account, I don't have account from this bank, Could I know if I can withdraw the money? DNI 427 thanks a lot"
        }

    ],
    [
        {
            "created": "2017-02-01T22:19:13+0000",
            "from": "Bank",
            "message": " Hello Adolfo, the money is available."
        }, 
        {            
            "created": "2017-02-01T16:22:33+0000",
            "from": "Omar",
            "message": "hello they have deposited the money into my account."
        }

    ]



]"""

1 个答案:

答案 0 :(得分:1)

看起来您需要将assign声明分开。无需append=True

js = json.loads(j)
df = pd.concat([pd.DataFrame(j) for j in js], ignore_index=True)
df['from'] = df['from'].str.strip()
df['created'] = pd.to_datetime(df.created)
df['qna'] = np.where(df['from'] == 'Bank', 'Answer', 'Question')
df1 = df.set_index(['created', 'qna']).unstack(fill_value='')

with pd.option_context('display.max_colwidth', 30, 'display.expand_frame_repr', False):
    print(df1)

输出

                       from                                 message                               
qna                 Answer Question                         Answer                       Question
created                                                                                          
2017-02-01 16:22:30            Alex                                 hello they have deposited ...
2017-02-01 22:19:12   Bank            Hello Alexander, the mone...                               
2017-02-01 22:19:18            Alex                                                Good afternoon
2017-02-01 22:19:28            Alex                                 I have issues to make paym...
2017-02-01 22:19:42            Alex                                 the sms with the correspon...
2017-02-01 22:19:58            Alex                                 Could someone please help ...
2017-02-02 11:57:41   Bank           Hi Alex, if you have not p...