Question

我有一个数据框：

, overall_score, industry_score
0, 15, -
1, 18, 12
2, - , 1
3, - , -
4, 12, 3

出于某种原因，我跑：

print(df.isnull().sum())

它没有看到＆＃39; - ＆＃39;如索引号0,2和3中所示，作为纳米值，我该如何解决这个问题？因为 - 确实意味着缺少数据点。

df.to_dict（）的结果：

{' overall_score': {0: ' 15', 1: ' 18', 2: ' - ', 3: ' - ', 4: ' 12'}, ' industry_score': {0: ' -', 1: ' 12', 2: ' 1', 3: ' -', 4: ' 3'}}

Answer 1

您说您的数据已被删除。但是在某些时候它被读取到数据帧并且在读取过程中传递dtype ='float'会更有效。

但我们假设您接管了该数据帧。在这种情况下，使用df.apply(pd.to_numeric, errors='coerce')将您的值转换为数字（在此过程中，非valids，例如' - '将替换为nan）。

完整示例：

import pandas as pd

data = '''\
overall_score,industry_score
15,-
18,12
-,1
-,-
12,3'''

df = pd.read_csv(pd.compat.StringIO(data), sep=',')
print(df.isnull().sum())

#overall_score     0
#industry_score    0
#dtype: int64

cols = ['overall_score', 'industry_score']
df[cols] = df[cols].apply(pd.to_numeric, errors='coerce')
print(df.isnull().sum())

#overall_score     2
#industry_score    2
#dtype: int64

Answer 2

`read_csv`

使用na_values参数解析文件时解决问题。

pd.read_csv('test.csv', na_values=['-'], index_col=0, sep='\s*\,\s*', engine='python')

   overall_score  industry_score
0           15.0             NaN
1           18.0            12.0
2            NaN             1.0
3            NaN             NaN
4           12.0             3.0

`mask`

如果列属于dtype object且有意保留，则此功能非常有用。

df.mask(lambda x: x == '-')

   overall_score  industry_score
0           15.0             NaN
1           18.0            12.0
2            NaN             1.0
3            NaN             NaN
4           12.0             3.0

在数据帧中用nan替换某个值

2 个答案:

`read_csv`

`mask`