Pandas无法从Numpy时间戳数组创建DataFrame

时间:2016-05-25 18:47:47

标签: python arrays numpy pandas dataframe

我有一个庞大的Pandas Timestamps数组:

array([[Timestamp('2016-05-02 15:50:00+0000', tz='UTC', offset='5T'),
        Timestamp('2016-05-02 15:50:00+0000', tz='UTC', offset='5T'),
        Timestamp('2016-05-02 15:50:00+0000', tz='UTC', offset='5T')],
       [Timestamp('2016-05-02 17:10:00+0000', tz='UTC', offset='5T'),
        Timestamp('2016-05-02 17:10:00+0000', tz='UTC', offset='5T'),
        Timestamp('2016-05-02 17:10:00+0000', tz='UTC', offset='5T')],
       [Timestamp('2016-05-02 20:25:00+0000', tz='UTC', offset='5T'),
        Timestamp('2016-05-02 20:25:00+0000', tz='UTC', offset='5T'),
        Timestamp('2016-05-02 20:25:00+0000', tz='UTC', offset='5T')]], dtype=object)

我无法从此数组创建DataFrame,因为尝试这样做会引发以下错误:

AssertionError: Number of Block dimensions (1) must equal number of axes (2)

您可以看到数组显然是二维的,我使用ndim验证了这一点。

为什么我不能创建DataFrame?

1 个答案:

答案 0 :(得分:1)

我认为你可以使用list理解:

import pandas as pd
import numpy as np

a =np.array([[pd.Timestamp('2016-05-02 15:50:00+0000', tz='UTC', offset='5T'),
        pd.Timestamp('2016-05-02 15:50:00+0000', tz='UTC', offset='5T'),
        pd.Timestamp('2016-05-02 15:50:00+0000', tz='UTC', offset='5T')],
       [pd.Timestamp('2016-05-02 17:10:00+0000', tz='UTC', offset='5T'),
        pd.Timestamp('2016-05-02 17:10:00+0000', tz='UTC', offset='5T'),
        pd.Timestamp('2016-05-02 17:10:00+0000', tz='UTC', offset='5T')],
       [pd.Timestamp('2016-05-02 20:25:00+0000', tz='UTC', offset='5T'),
        pd.Timestamp('2016-05-02 20:25:00+0000', tz='UTC', offset='5T'),
        pd.Timestamp('2016-05-02 20:25:00+0000', tz='UTC', offset='5T')]], dtype=object)

df = pd.DataFrame([x for x in a], columns=['a','b','c'])
print (df)
                          a                         b  \
0 2016-05-02 15:50:00+00:00 2016-05-02 15:50:00+00:00   
1 2016-05-02 17:10:00+00:00 2016-05-02 17:10:00+00:00   
2 2016-05-02 20:25:00+00:00 2016-05-02 20:25:00+00:00   

                          c  
0 2016-05-02 15:50:00+00:00  
1 2016-05-02 17:10:00+00:00  
2 2016-05-02 20:25:00+00:00  

另一个解决方案是DataFrame.from_records

print (pd.DataFrame.from_records(a, columns=['a','b','c']))
                          a                         b  \
0 2016-05-02 15:50:00+00:00 2016-05-02 15:50:00+00:00   
1 2016-05-02 17:10:00+00:00 2016-05-02 17:10:00+00:00   
2 2016-05-02 20:25:00+00:00 2016-05-02 20:25:00+00:00   

                          c  
0 2016-05-02 15:50:00+00:00  
1 2016-05-02 17:10:00+00:00  
2 2016-05-02 20:25:00+00:00  

请参阅alternate constructors of df