我目前使用Jupyter笔记本分析公司数据。我的第一步是清理和格式化数据。到目前为止,我的代码是:
%matplotlib inline
# First, we'll import pandas, a data processing and CSV file I/O library
import pandas as pd
# We'll also import seaborn, a Python graphing library
import warnings # current version of seaborn generates a bunch of warnings that we'll ignore
warnings.filterwarnings("ignore")
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
sns.set(style="dark", color_codes=True)
Users = pd.read_csv("Users.csv", delimiter = ';', engine = 'python') # maak een pandas dataframe per bestand
Users['ContractHours'].fillna(0, inplace = True)
Users['ContractHours'] = Users['ContractHours'].apply(pd.to_numeric)
然后,我尝试在ContractHours列中将NaN值替换为零,并将该列转换为float。将NaN替换为0成功。但我收到错误消息:
ValueError Traceback (most recent call last)
pandas\_libs\src\inference.pyx in pandas._libs.lib.maybe_convert_numeric (pandas\_libs\lib.c:56156)()
ValueError: Unable to parse string "32,5"
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
<ipython-input-22-bcb66b8c06fb> in <module>()
20 #Users = Users['ContractHours'].replace(',', '.')
21 Users['ContractHours'].fillna(0, inplace = True)
---> 22 Users['ContractHours'] = Users['ContractHours'].apply(pd.to_numeric)
23
24 #print(Customers.head(10))
C:\Users\masc\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\series.py in apply(self, func, convert_dtype, args, **kwds)
2353 else:
2354 values = self.asobject
-> 2355 mapped = lib.map_infer(values, f, convert=convert_dtype)
2356
2357 if len(mapped) and isinstance(mapped[0], Series):
pandas\_libs\src\inference.pyx in pandas._libs.lib.map_infer (pandas\_libs\lib.c:66645)()
C:\Users\masc\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\tools\numeric.py in to_numeric(arg, errors, downcast)
124 coerce_numeric = False if errors in ('ignore', 'raise') else True
125 values = lib.maybe_convert_numeric(values, set(),
--> 126 coerce_numeric=coerce_numeric)
127
128 except Exception:
pandas\_libs\src\inference.pyx in pandas._libs.lib.maybe_convert_numeric (pandas\_libs\lib.c:56638)()
ValueError: Unable to parse string "32,5" at position 0
如何将字符串“ 32,5”解析为“ ContractHours”列中的浮点数?
我也试图用'。'代替','。之前,但是结果导致所有其他列消失,并且逗号仍然是逗号。
Users = Users['ContractHours'].replace(',', '.')
结果是:
0 34
1 24
2 40
3 35
4 40
5 24
6 32
7 32
8 32
9 24
10 24
11 24
12 24
13 0
14 32
15 28
16 32
17 32
18 28
19 24
20 40
21 40
22 36
23 24
24 32,5
25 36
26 36
27 24
28 40
29 40
30 28
31 32
32 32
33 40
34 32
35 24
36 24
37 40
38 25
39 24
Name: ContractHours, dtype: object
,所有其他列均消失,并且32,5必须为32.5
答案 0 :(得分:2)
使用参数decimal
进行read_csv
中正确的floats
解析:
Users = pd.read_csv("Users.csv", sep = ';', decimal=',')
您的解决方案应更改为regex=True
,以替换为子字符串:
Users = Users['ContractHours'].replace(',', '.', regex=True).astype(float)