Question

我目前使用Jupyter笔记本分析公司数据。我的第一步是清理和格式化数据。到目前为止，我的代码是：

%matplotlib inline
# First, we'll import pandas, a data processing and CSV file I/O library
import pandas as pd
# We'll also import seaborn, a Python graphing library
import warnings # current version of seaborn generates a bunch of warnings that we'll ignore
warnings.filterwarnings("ignore")
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
sns.set(style="dark", color_codes=True)

Users = pd.read_csv("Users.csv", delimiter = ';', engine = 'python') # maak een pandas dataframe per bestand
Users['ContractHours'].fillna(0, inplace = True)
Users['ContractHours'] = Users['ContractHours'].apply(pd.to_numeric)

然后，我尝试在ContractHours列中将NaN值替换为零，并将该列转换为float。将NaN替换为0成功。但我收到错误消息：

ValueError                                Traceback (most recent call last)
pandas\_libs\src\inference.pyx in pandas._libs.lib.maybe_convert_numeric (pandas\_libs\lib.c:56156)()

ValueError: Unable to parse string "32,5"

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-22-bcb66b8c06fb> in <module>()
     20 #Users = Users['ContractHours'].replace(',', '.')
     21 Users['ContractHours'].fillna(0, inplace = True)
---> 22 Users['ContractHours'] = Users['ContractHours'].apply(pd.to_numeric)
     23 
     24 #print(Customers.head(10))

C:\Users\masc\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\series.py in apply(self, func, convert_dtype, args, **kwds)
   2353             else:
   2354                 values = self.asobject
-> 2355                 mapped = lib.map_infer(values, f, convert=convert_dtype)
   2356 
   2357         if len(mapped) and isinstance(mapped[0], Series):

pandas\_libs\src\inference.pyx in pandas._libs.lib.map_infer (pandas\_libs\lib.c:66645)()

C:\Users\masc\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\tools\numeric.py in to_numeric(arg, errors, downcast)
    124             coerce_numeric = False if errors in ('ignore', 'raise') else True
    125             values = lib.maybe_convert_numeric(values, set(),
--> 126                                                coerce_numeric=coerce_numeric)
    127 
    128     except Exception:

pandas\_libs\src\inference.pyx in pandas._libs.lib.maybe_convert_numeric (pandas\_libs\lib.c:56638)()

ValueError: Unable to parse string "32,5" at position 0

如何将字符串“ 32,5”解析为“ ContractHours”列中的浮点数？

我也试图用'。'代替'，'。之前，但是结果导致所有其他列消失，并且逗号仍然是逗号。

Users = Users['ContractHours'].replace(',', '.')

结果是：

0       34
1       24
2       40
3       35
4       40
5       24
6       32
7       32
8       32
9       24
10      24
11      24
12      24
13       0
14      32
15      28
16      32
17      32
18      28
19      24
20      40
21      40
22      36
23      24
24    32,5
25      36
26      36
27      24
28      40
29      40
30      28
31      32
32      32
33      40
34      32
35      24
36      24
37      40
38      25
39      24
Name: ContractHours, dtype: object

，所有其他列均消失，并且32,5必须为32.5

Answer 1

使用参数decimal进行read_csv中正确的floats解析：

Users = pd.read_csv("Users.csv", sep = ';', decimal=',')

您的解决方案应更改为regex=True，以替换为子字符串：

Users = Users['ContractHours'].replace(',', '.', regex=True).astype(float)

ValueError：无法解析熊猫数据框中的值“ 32,5”上的字符串

1 个答案: