我在从df列中删除非数字时遇到问题。我尝试了一些方法,但是当函数通过列时,仍有不少产生NaN值的方法。
我需要输出只是整数形式的数字(无前导零)
Cust #
0 10726
2 11699
5 12963
8 z13307
9 13405
12 14831-001
16 16416
17 16917
18 18027
24 19233z
dtype('O')
我试过了:
Unique_Stores['Cust #2']=Unique_Stores['Cust #2'].str.extract('(\d+)',expand=True)
Unique_Stores['Cust #2'].str.replace(r'(\D+)','')
Unique_Stores['Cust #2'].replace(to_replace="([0-9]+)", value=r"\1", regex=True, inplace=True)
Unique_Stores['Cust #2'] = pd.to_numeric(Unique_Stores['Cust #2'].str.replace(r'\D+', ''), errors='coerce')
提前感谢您,如果您需要更多信息,请告诉我们。
但无论我做什么,前1000行左右返回NaN值 - 即使该值是整数。
答案 0 :(得分:2)
<强>更新强>
In [144]: df = pd.read_csv(r'D:\download\Customer_Numbers.csv', index_col=0)
In [145]: df['Cust #2'] = df['Cust #'].str.replace(r'\D+', '').astype(int)
In [146]: df
Out[146]:
State Zip Code Cust # Cust #2
0 PA 16505 10726 10726
2 MI 48103 11699 11699
5 NH 3253 12963 12963
8 PA 18951 13307 13307
9 MA 2360 13405 13405
12 NY 11940 14831 14831
16 OH 44278 16416 16416
17 OH 45459 16917 16917
18 MA 1748 18027 18027
24 NY 14226 19233 19233
... ... ... ... ...
54393 WA 99207 005611-99 561199
54394 WA 99006 7775 7775
54395 WA 99353 8006 8006
54399 WA 99206 8888 8888
54404 CA 92117 444202 444202
54408 CA 90019 30066 30066
54411 CA 90026 443607 443607
54414 CA 90094 9242 9242
54417 CA 90405 9245 9245
54420 CA 90038 9247 9247
[6492 rows x 4 columns]
In [147]: df.dtypes
Out[147]:
State object
Zip Code object
Cust # object
Cust #2 int32
dtype: object
OLD回答:
In [123]: df
Out[123]:
val
0 10726
2 11699
5 12963
8 z13307
9 13405
12 14831-001
16 16416
17 16917
18 18027
24 19233z
In [124]: df['val'] = pd.to_numeric(df['val'].str.replace(r'\D+', ''), errors='coerce')
In [125]: df
Out[125]:
val
0 10726
2 11699
5 12963
8 13307
9 13405
12 14831001
16 16416
17 16917
18 18027
24 19233