我想在DataFrame收入中添加一个名为'2016 Salary($)'的新列,其中包含来自表Salary Paid的员工工资作为数字。我想通过删除'$'和','去除该数字。
但是当我这样做时,我得到了错误提示:
“无法将字符串转换为浮点数”
我尝试遵循提示,但不起作用:
income['2016 Salary ($)']= income['SalaryPaid'].str.strip('$').astype(float)
income['2016 Salary ($)'].apply(lambda X:X['Salary Paid'])
income
答案 0 :(得分:2)
尝试这样的事情:
数据:
dic = {'Name':['John','Peter'],'SalaryPaid':['$204,546,289.35','$500,231,289.35'],'Year':['2008','2009']}
df1 = pd.DataFrame(dic)
df1
Name SalaryPaid Year
0 John $204,546,289.35 2008
1 Peter $500,231,289.35 2009
代码:
df1['SalaryPaid'] = df1['SalaryPaid'].str.replace(',', '')
# If you want the result as a string :
df1['2016 Salary ($)']= df1['SalaryPaid'].str.strip('$')
# if you want the result as float :
#df1['2016 Salary ($)']= df1['SalaryPaid'].str.strip('$').astype(float)
df1
结果:
Name SalaryPaid Year 2016 Salary ($)
0 John $204546289.35 2008 204546289.35
1 Peter $500231289.35 2009 500231289.35
答案 1 :(得分:2)
首先添加Series.str.replace
:
income['2016 Salary ($)']= income['SalaryPaid'].str.replace(',', '')
.str.strip('$')
.astype(float)
如果从文件创建DataFrame
是在read_csv
中使用thousands
参数,则是更好的解决方案:
income = pd.read_csv(file, thousands=',')
income['2016 Salary ($)']= income['SalaryPaid'].str.strip('$').astype(float)
答案 2 :(得分:1)
我已根据您的要求创建了一个虚拟数据框,并执行了与您上面提到的相同的操作,对我来说效果很好。
import pandas as pd
df = pd.DataFrame(columns=['AA','BB'])
df['AA'] = ['$12,20','$13,30']
df['BB'] = ['X','Y']
print(df)
输出-----> AA BB 0 $ 12,20 X 1年$ 13,30是
df['AA'] = df['AA'].str.replace('$','').str.replace(',','').astype(float)
print(df)
输出-----> AA BB 01220.0 X 1 1330.0是
根据我的错误是在代码的第二行中您尝试应用lambda,而不是“ income ['2016 Salary($)']。apply(lambda X:X ['Salary Paid']) ”应该是“收入['2016 Salary($)']。apply(lambda X:X ['SalaryPaid'])”。我认为名为SalaryPaid的列存在输入错误。
答案 3 :(得分:0)
还可以:
def convert(x):
return float(x.replace('$','').replace(',',''))
income['2016 Salary ($)'] = income['Salary Paid'].apply(convert)
或
def convert(x):
return float(''.join(re.findall('[\d+\.]',x)))