Python-用于更改数据类型和替换Pandas DataFrame值的UDF

时间:2019-02-05 19:20:45

标签: python pandas

我正在尝试编写一个用户定义的函数,该函数可以查看输入列的数据类型并对其进行更改。

我的输入数据类型将为int64,float64,object,datetimens [64]。

如果它是一个datetimens [64],那么我将空白日期替换为另一个自定义日期。输出数据类型也将为datetimens [64]

如果它是int64,float64或对象。我用字符串“ FILLINGTHENAS”代替空格,并将所有这些数据类型转换为对象。

def Change_Data_Type_DataFrame (AnyPandasDataFrame):
    cr_date = datetime(1800,1,1,1,1,1)        
    for i in range(1, AnyPandasDataFrame.shape[1]):
        Required_Column_Name = (AnyPandasDataFrame.columns[i])
        Required_Data_Type = AnyPandasDataFrame[Required_Column_Name].dtype                                       
        if Required_Data_Type == 'datetime64[ns]':
            DateChecker = True
        else:
            DateChecker = contains_word(Required_Column_Name, "Date","of Death","Day of Work") 
        if DateChecker == False :
            if Required_Data_Type == 'int64':
                print("Yes")
                AnyPandasDataFrame[Required_Column_Name] = AnyPandasDataFrame[Required_Column_Name].fillna("FILLINGTHENAS")
                AnyPandasDataFrame[Required_Column_Name] = AnyPandasDataFrame[Required_Column_Name].astype(str)
                AnyPandasDataFrame[Required_Column_Name] = AnyPandasDataFrame[Required_Column_Name].astype(str).replace('\.0', '', regex=True)
            if Required_Data_Type == object:
                AnyPandasDataFrame[Required_Column_Name] = AnyPandasDataFrame[Required_Column_Name].fillna("FILLINGTHENAS")
                AnyPandasDataFrame[Required_Column_Name] = AnyPandasDataFrame[Required_Column_Name].astype(str)
                AnyPandasDataFrame[Required_Column_Name] = AnyPandasDataFrame[Required_Column_Name].astype(str).replace('\.0', '', regex=True)
            if Required_Data_Type == 'float64':
                    AnyPandasDataFrame[Required_Column_Name] = AnyPandasDataFrame[Required_Column_Name].fillna("FILLINGTHENAS")
                    AnyPandasDataFrame[Required_Column_Name] = AnyPandasDataFrame[Required_Column_Name].astype(str)
                    AnyPandasDataFrame[Required_Column_Name] = AnyPandasDataFrame[Required_Column_Name].astype(str).replace('\.0', '', regex=True)         
        else:
            AnyPandasDataFrame[Required_Column_Name] = AnyPandasDataFrame[Required_Column_Name].fillna(cr_date)
            AnyPandasDataFrame[Required_Column_Name] = AnyPandasDataFrame[Required_Column_Name].astype('datetime64[ns]')  
    return (AnyPandasDataFrame)

我有一个巨大的100列数据框,我的函数失败了,因为我在输出数据框中看到了int64。

打印-是的,它不起作用,但是我的df肯定具有int64 dtypes。

我要去哪里了,我的代码能写得更好吗?

请帮助我。

1 个答案:

答案 0 :(得分:0)

我对代码进行了以下更改。

范围从1开始,我使其从0开始

我删除了多个if,并将其作为一个if逻辑

我再次替换后重新设置了数据类型,只是为了确保“熊猫没有将其重新设置”。

def Change_Data_Type_DataFrame (AnyPandasDataFrame):

cr_date = datetime(1800,1,1,1,1,1)        
for i in range(0, AnyPandasDataFrame.shape[1]):
    Required_Column_Name = (AnyPandasDataFrame.columns[i])
    print(Required_Column_Name)
    Required_Data_Type = AnyPandasDataFrame[Required_Column_Name].dtype                                       
    if Required_Data_Type == 'datetime64[ns]':
        DateChecker = True
    else:
        DateChecker = contains_word(Required_Column_Name, "Date","of Death","Day of Work") 
    if DateChecker == False :
            AnyPandasDataFrame[Required_Column_Name] = AnyPandasDataFrame[Required_Column_Name].fillna("FILLINGTHENAS")
            AnyPandasDataFrame[Required_Column_Name] = AnyPandasDataFrame[Required_Column_Name].astype(str).replace('\.0', '', regex=True)
            AnyPandasDataFrame[Required_Column_Name] = AnyPandasDataFrame[Required_Column_Name].astype(str)
    else:
        AnyPandasDataFrame[Required_Column_Name] = AnyPandasDataFrame[Required_Column_Name].fillna(cr_date)
        AnyPandasDataFrame[Required_Column_Name] = AnyPandasDataFrame[Required_Column_Name].astype('datetime64[ns]')  
return (AnyPandasDataFrame)