字符串值未使用replace()方法转换为数值

时间:2017-07-26 04:17:24

标签: python pandas replace

我正在使用regex=True将字符串值替换为数值以进行分析。我没有错误,但是当我检查数据帧后,值保持不变。我也尝试使用df['international plan'].replace(['no', 'yes'], [0, 1], inplace = True) df['voice mail plan'].replace(['yes', 'no'], [1,0], inplace = True) df['churn'].replace(['False', 'True'], [0, 1], inplace = True) 并遇到了同样的问题。任何帮助深表感谢。我的笔记本的打印屏幕附在下面,原始代码如下。

{{1}}

Print screen from my Jupyter Notebook

2 个答案:

答案 0 :(得分:0)

根据您的Notebook屏幕截图,您的列值为“yes”,“no”,“True。”和“False”。包含它周围的空格,因此.replace()不起作用,剥离空格并将yes / no更改为1/0,如:

df['international plan'] = df['international plan'].apply(lambda x: 1 if x.strip() == "yes" else 0)

df['voice mail plan'] = df['voice mail plan'].apply(lambda x: 1 if x.strip() == "yes" else 0)

df['churn'] = df['churn'].apply(lambda x: 1 if x.strip() == "True." else 0)

答案 1 :(得分:0)

值中存在一些空格:

np.random.seed(789)
df = pd.DataFrame({'international plan': np.random.choice([' yes',' no'], size=5),
                  'voice mail plan': np.random.choice([' yes',' no'], size=5),
                  'churn': np.random.choice([' False.',' True.'], size=5),
                  'area code': np.random.choice([415,408], size=5)})
print (df)
   area code    churn international plan voice mail plan
0        408    True.                 no             yes
1        415   False.                yes             yes
2        408    True.                yes              no
3        408   False.                yes             yes
4        408   False.                 no             yes

apply针对循环列cols的解决方案,并dict使用str.stripSeries.replace

cols = ['international plan','voice mail plan','churn']
d = {'no':0,'yes':1, 'True.':1, 'False.':0}
df[cols] = df[cols].apply(lambda x: x.str.strip().replace(d))
print (df)
   area code  churn  international plan  voice mail plan
0        408      1                   0                1
1        415      0                   1                1
2        408      1                   1                0
3        408      0                   1                1
4        408      0                   0                1

或者向dict中的键添加空格,然后使用DataFrame.replace

cols = ['international plan','voice mail plan','churn']
d = {' no':0,' yes':1, ' True.':1, ' False.':0}
df[cols] = df[cols].replace(d)

如果想分别更换每一列:

df['international plan'] = df['international plan'].str.strip().replace(['no','yes'],[0, 1])
df['voice mail plan'] = df['voice mail plan'].str.strip().replace(['yes','no'],[1,0])
df['churn'] = df['churn'].str.strip().replace(['False.','True.'],[0, 1])
print (df)
   area code  churn  international plan  voice mail plan
0        408      1                   0                1
1        415      0                   1                1
2        408      1                   1                0
3        408      0                   1                1
4        408      0                   0                1