Question

我有一个包含8列的CSV文件。在8列中，有2列的值包含,，例如2,134

要进行处理，我需要将数据转换为数字（浮点数）

df = pd.read_csv('data.csv')
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 90181 entries, 0 to 90180
Data columns (total 8 columns):
user_id                        90181 non-null object
location_id                    90181 non-null int64
is_shift_accepted              90181 non-null int64
shift_accepted_role            90179 non-null float64
shift_accepted_specialities    89973 non-null float64
distance                       90144 non-null object
years_of_experience            80604 non-null float64
shift_id                       90181 non-null object
dtypes: float64(3), int64(2), object(3)
memory usage: 5.5+ MB

现在让我们转换为数字

df = df.convert_objects(convert_numeric=True)
df.dtypes
user_id                        float64
location_id                      int64
is_shift_accepted                int64
shift_accepted_role            float64
shift_accepted_specialities    float64
distance                       float64
years_of_experience            float64
shift_id                       float64
dtype: object

现在检查空值-

# checking for missing values if any
df.isnull().sum()
user_id                        89943
location_id                        0
is_shift_accepted                  0
shift_accepted_role                2
shift_accepted_specialities      208
distance                         249
years_of_experience             9577
shift_id                       90042
dtype: int64

这里user_id和shift_id的值都为,，尽管具有非空值，但空计数最高。是由于其中存在,吗？预处理这些数据的正确方法是什么？

这是数据的样子

处理在熊猫中带有逗号的值

0 个答案: