Question

我在python中从AWS拉出的数据中创建了一个数据框。

我将使用67列中的3列，并且我意识到这些列的数据类型是对象。

我想知道如何将这些对象数据类型更改为其他内容。

我尝试了很多方法，但它没有用。

我的数据如下：

formation_tops = pd.read_csv("C:/Users/juan/Documents/revonos-ds-sandbox/formation_tops/regulatory_agency=COGCC/000000_0",
                             sep='\t', header = None, names= cols1, index_col = False, dtype='unicode')

然后我用我想要的3列创建了一个不同的数据框：

            formation_name log_bottom log_top
UWI                                           
05-001-05000      BENTONITE         \N    5118
05-001-05000         D SAND         \N    5211
05-001-05000      GREENHORN         \N    4908
05-001-05000         J SAND         \N    5260
05-001-05000       NIOBRARA         \N    4380
05-001-05001        CARLILE         \N    4720
05-001-05001         D SAND         \N    5131
05-001-05001      GREENHORN         \N    4821
05-001-05001         J SAND         \N    5179
05-001-05001          MOWRY         \N    5034
05-001-05001       NIOBRARA         \N    4227

我尝试过尝试更改数据类型的不同方法，但我遇到以下错误：

File "pandas\_libs\src\inference.pyx", line 1047, in pandas._libs.lib.maybe_convert_numeric (pandas\_libs\lib.c:56433)

ValueError: Unable to parse string "\N" at position 0

另外

 cleaned_dataframe['log_bottom']=  cleaned_dataframe.log_bottom.str.replace('\N', '')
                                                                              ^
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: malformed \N character escape

我假设由于存在unicode错误，我应该以某种方式编码为可读格式。

任何帮助将不胜感激。

Answer 1

我能够使用函数df['column'].convert_object(convert_numeric = True)转换数据框。

此功能允许列显示为float64。它会将\N转换为NaN并使用函数df.dropna()，我的数据框现在就会被清理。

将对象类型列转换为数字，字符串等

1 个答案: