pandas.DataFrame.replace更改列的dtype

时间:2019-12-27 12:25:49

标签: python pandas

因此,我尝试用np.nan替换数据框中的None值,并注意到在此过程中,数据框中的float列的数据类型已更改为object即使它们不包含任何丢失的数据。

例如:

import pandas as pd
import numpy as np
data = pd.DataFrame({'A':np.nan,'B':1.096, 'C':1}, index=[0])
data.replace(to_replace={np.nan:None}, inplace=True)

在调用data.dtypes之前和之后,对replace的调用显示列B的数据类型从float更改为object,而C的数据类型保持为int。 如果我从原始数据中删除A列,则不会发生。 我想知道为什么会发生这种变化,以及如何避免这种影响。

2 个答案:

答案 0 :(得分:0)

当您替换每列并从<div class="container"> <div class="row justify-content-center"> <div class="col text-center"> <!-- Button trigger modal --> <button type="button" class="btn btn-primary margin-t newButton" data-toggle="modal" data-target="#newUserModal"> Add user </button> </div> </div> </div> <!-- New User Modal --> <div class="modal fade" id="newUserModal" tabindex="-1" role="dialog" aria-labelledby="newUserModalLabel" aria-hidden="true"> <div class="modal-dialog modal-lg" role="document"> <div class="modal-content"> <div class="modal-header"> <h5 class="modal-title" id="newUserModalLabel">New user</h5> <button type="button" class="close" data-dismiss="modal" aria-label="Close"> <span aria-hidden="true">&times;</span> </button> </div> <div class="modal-body"> <form role="form" method="post" id="new-user-form" class="needs-validation" action="<?= base_url(); ?>test/newUser" novalidate> <div class="form-row"> <div class="col-md-6 mb-3"> <label for="inputFirstName">First name</label> <input type="text" class="form-control" name="inputFirstName" id="inputFirstName" placeholder="" required> <div class="invalid-feedback"> Invalid input </div> </div> <div class="col-md-6 mb-3"> <label for="inputLastName">Last name</label> <input type="text" class="form-control" name="inputLastName" id="inputLastName" placeholder="" required> <div class="invalid-feedback"> Invalid input </div> </div> </div> <div class="form-row"> <div class="col-md-4 mb-3"> <div class="avatar-upload"> <div class="avatar-edit"> <input type='file' name="index" id="indexImageUpload" accept=".png, .jpg, .jpeg" /> <label class="text-center" for="indexImageUpload"></label> </div> <div class="avatar-preview"> <div id="indexImage" style="background-image: url(https://ryanacademy.ie/wp-content/uploads/2017/04/user-placeholder.png)"> </div> </div> </div> </div> <div class="col-md-4 mb-3"> <div class="avatar-upload"> <div class="avatar-edit"> <input type='file' name="picture1" id="picture1Upload" accept=".png, .jpg, .jpeg" /> <label class="text-center" for="picture1Upload"></label> </div> <div class="avatar-preview"> <div id="picture1" style="background-image: url(https://ryanacademy.ie/wp-content/uploads/2017/04/user-placeholder.png)"> </div> </div> </div> </div> <div class="col-md-4 mb-3"> <div class="avatar-upload"> <div class="avatar-edit"> <input type='file' name="picture2" id="picture2Upload" accept=".png, .jpg, .jpeg" /> <label class="text-center" for="picture2Upload"></label> </div> <div class="avatar-preview"> <div id="picture2" style="background-image: url(https://ryanacademy.ie/wp-content/uploads/2017/04/user-placeholder.png)"> </div> </div> </div> </div> </div> </form> </div> <div class="modal-footer"> <button type="button" class="btn btn-secondary closeButton" data-dismiss="modal">Close</button> <button type="submit" class="btn btn-primary" form="new-user-form">Save</button> </div> </div> </div> </div> 而不是replace调用pd.Series(...)时,效果很好。

除了注释pd.DataFrame(...)中所述,不能将其强制转换为浮点数(或int或任何数字-您宁愿使用NoneType()),因此它将被自动强制转换为{{1 }}。

NaN

输出:

object

答案 1 :(得分:0)

我已经遇到过很多次了,并且有一个解决方法。在使用astype(object)替换之前,它将保留dtype。我不得不将其用于合并问题,合并问题等。我不确定为什么以这种方式使用时会保留类型,但确实如此,一旦找到它就很有用。

data.info()    

#<class 'pandas.core.frame.DataFrame'>
#Int64Index: 1 entries, 0 to 0
#Data columns (total 3 columns):
#A    0 non-null float64
#B    1 non-null float64
#C    1 non-null int64
#dtypes: float64(2), int64(1)
#memory usage: 32.0 bytes

import pandas as pd 
import numpy as np 
data = pd.DataFrame({'A':np.nan,'B':1.096, 'C':1}, index=[0]) 
data.replace(to_replace={np.nan:None}, inplace=True)                                                                                                                                 

data.info()   

#<class 'pandas.core.frame.DataFrame'>
#Int64Index: 1 entries, 0 to 0
#Data columns (total 3 columns):
#A    0 non-null object
#B    1 non-null object
#C    1 non-null int64
#dtypes: int64(1), object(2)
#memory usage: 32.0+ bytes

import pandas as pd 
import numpy as np 
data = pd.DataFrame({'A':np.nan,'B':1.096, 'C':1}, index=[0]) 
data.astype(object).replace(to_replace={np.nan:None}, inplace=True)                                                                                                                  

data.info()                                                                                                                                                                          

#<class 'pandas.core.frame.DataFrame'>
#Int64Index: 1 entries, 0 to 0
#Data columns (total 3 columns):
#A    0 non-null float64
#B    1 non-null float64
#C    1 non-null int64
#dtypes: float64(2), int64(1)
#memory usage: 32.0 bytes