从Pandas列中提取数字(对象dtype)

时间:2017-03-10 22:23:03

标签: python pandas

我在从df列中删除非数字时遇到问题。我尝试了一些方法,但是当函数通过列时,仍有不少产生NaN值的方法。

我需要输出只是整数形式的数字(无前导零)

Cust #
0   10726
2   11699
5   12963
8   z13307
9   13405
12  14831-001
16  16416
17  16917
18  18027
24  19233z
dtype('O')

我试过了:

Unique_Stores['Cust #2']=Unique_Stores['Cust #2'].str.extract('(\d+)',expand=True)

Unique_Stores['Cust #2'].str.replace(r'(\D+)','')

Unique_Stores['Cust #2'].replace(to_replace="([0-9]+)", value=r"\1", regex=True, inplace=True)

Unique_Stores['Cust #2'] = pd.to_numeric(Unique_Stores['Cust #2'].str.replace(r'\D+', ''), errors='coerce')

提前感谢您,如果您需要更多信息,请告诉我们。

但无论我做什么,前1000行左右返回NaN值 - 即使该值是整数。

Link to actual dataset

1 个答案:

答案 0 :(得分:2)

<强>更新

In [144]: df = pd.read_csv(r'D:\download\Customer_Numbers.csv', index_col=0)

In [145]: df['Cust #2'] = df['Cust #'].str.replace(r'\D+', '').astype(int)

In [146]: df
Out[146]:
      State Zip Code      Cust #  Cust #2
0        PA    16505       10726    10726
2        MI    48103       11699    11699
5        NH     3253       12963    12963
8        PA    18951       13307    13307
9        MA     2360       13405    13405
12       NY    11940       14831    14831
16       OH    44278       16416    16416
17       OH    45459       16917    16917
18       MA     1748       18027    18027
24       NY    14226       19233    19233
...     ...      ...         ...      ...
54393    WA    99207  005611-99    561199
54394    WA    99006        7775     7775
54395    WA    99353        8006     8006
54399    WA    99206        8888     8888
54404    CA    92117      444202   444202
54408    CA    90019       30066    30066
54411    CA    90026      443607   443607
54414    CA    90094        9242     9242
54417    CA    90405        9245     9245
54420    CA    90038        9247     9247

[6492 rows x 4 columns]

In [147]: df.dtypes
Out[147]:
State       object
Zip Code    object
Cust #      object
Cust #2      int32
dtype: object

OLD回答:

In [123]: df
Out[123]:
          val
0       10726
2       11699
5       12963
8      z13307
9       13405
12  14831-001
16      16416
17      16917
18      18027
24     19233z

In [124]: df['val'] = pd.to_numeric(df['val'].str.replace(r'\D+', ''), errors='coerce')

In [125]: df
Out[125]:
         val
0      10726
2      11699
5      12963
8      13307
9      13405
12  14831001
16     16416
17     16917
18     18027
24     19233