Question

我有三个数据帧：timestamp（带时间戳），dataSun（带有日出和日落的时间戳），dataData（带有不同的气候数据）。数据框j = df.reset_index().to_json() print (j) {"Country":{"0":"UK","1":"UK","2":"US"}, "Sub":{"0":1,"1":1,"2":2}, "amount":{"0":"source3","1":"source1","2":"source2"}}的数据类型为timestamp。

"int64"

数据框timestamp.head() timestamp 0 1521681600000 1 1521681900000 2 1521682200000 3 1521682500000 4 1521682800000也有数据类型dataSun。

"int64"

气候数据dataSun.head() sunrise sunset 0 1521696105000 1521740761000 1 1521696105000 1521740761000 2 1521696105000 1521740761000 3 1521696105000 1521740761000 4 1521696105000 1521740761000的数据框的数据类型为dataData。

"float64"

我想将这三个数据帧连接在一起。

dataData.head()
           temperature     pressure  humidity
    0     2.490000  1018.000000      99.0
    1     2.408333  1017.833333      99.0
    2     2.326667  1017.666667      99.0
    3     2.245000  1017.500000      99.0
    4     2.163333  1017.333333      99.0
    5     2.081667  1017.166667      99.0

为什么dataResult = pd.concat((timestamp, dataSun, dataData), axis = 1) dataResult.head() timestamp sunrise sunset temperature pressure 0 1521681600000 1.521696e+12 1.521741e+12 2.490000 1018.000000 1 1521681900000 1.521696e+12 1.521741e+12 2.408333 1017.833333 2 1521682200000 1.521696e+12 1.521741e+12 2.326667 1017.666667 3 1521682500000 1.521696e+12 1.521741e+12 2.245000 1017.500000 4 1521682800000 1.521696e+12 1.521741e+12 2.163333 1017.333333 5 1521683100000 1.521696e+12 1.521741e+12 2.081667 1017.166667 weatherMeasurements.info() <class 'pandas.core.frame.DataFrame'> RangeIndex: 7188 entries, 0 to 7187 Data columns (total 6 columns): timestamp 7188 non-null int64 sunrise 7176 non-null float64 sunset 7176 non-null float64 temperature 7176 non-null float64 pressure 7176 non-null float64 humidity 7176 non-null float64 dtypes: float64(5), int64(1)更改了值pd.concat的数据类型？我尝试过不同的方法来连接数据帧。例如，我在一个数据框中仅连接了DataSun和timestamp，然后我将结果数据框与dataSun连接起来。但结果却是一样的。如何连接三个数据帧并保护数据类型？

Answer 1

因为这个 -

timestamp      7188 non-null int64
sunrise        7176 non-null float64
...

timestamp有7188个非空值，而sunrise及以后有7176.不言而喻，有12个值不非空。意思是他们是NaNs。

由于NaNs为dtype=float，因此该列中的每个其他值都会自动升级为浮点数，浮点数通常以科学计数法表示。

那是为什么，但这并没有真正解决你的问题。您此时的选择是

使用dropna
使用fillna

（现在你可以将这些行向下转换为int。）

或者，如果您使用pd.concat执行join='inner'，则不引入NaN并保留dtypes。

pd.concat((timestamp, dataSun, dataData), axis=1, join='inner')

       timestamp        sunrise         sunset  temperature     pressure  \    
0  1521681600000  1521696105000  1521740761000     2.490000  1018.000000   
1  1521681900000  1521696105000  1521740761000     2.408333  1017.833333   
2  1521682200000  1521696105000  1521740761000     2.326667  1017.666667   
3  1521682500000  1521696105000  1521740761000     2.245000  1017.500000   
4  1521682800000  1521696105000  1521740761000     2.163333  1017.333333   

   humidity  
0      99.0  
1      99.0  
2      99.0  
3      99.0  
4      99.0

使用选项3，对每个数据帧的索引执行内部联接。

Answer 2

自熊猫1.0.0起，我相信您还有另一种选择，那就是首先使用convert_dtypes。这样可以将数据框列转换为支持pd.NA的dtype，从而避免了this答案中讨论的NaN问题。

为什么pd.concat将结果数据类型从int更改为float？

2 个答案: