Question

我的数据集有很多列包含带逗号的$值，例如$ 150,000.50。导入数据集后：

document.addEventListener('touchmove', blockTouchMove, { passive: false });

由于这些列中的一堆值为$ value，因此imputer对象失败了。如何在python程序中更正它

这是我的数据集。除了学校类型休息之外，所有用逗号都有$值。是否有通用方法从这些列值中删除那些$和逗号

datasets = pd.read_csv('salaries-by-college-type.csv')

以下是我的数据集示例：

School Type                          269 non-null object
Starting Median Salary               269 non-null float64
Mid-Career Median Salary             269 non-null float64
Mid-Career 10th Percentile Salary    231 non-null float64
Mid-Career 25th Percentile Salary    269 non-null float64
Mid-Career 75th Percentile Salary    269 non-null float64
Mid-Career 90th Percentile Salary    231 non-null float64

Answer 1

假设你有一个看起来像这样的csv 注意：我真的不知道你的csv是什么样的。请务必相应地调整read_csv参数。最具体地说，是sep参数。

h1|h2
a|$1,000.99
b|$500,000.00

使用converters中的pd.read_csv参数传递一个字典，其中包含要转换为键的列的名称以及执行转换的函数作为值。

pd.read_csv(
    'salaries-by-college-type.csv', sep='|',
    converters=dict(h2=lambda x: float(x.strip('$').replace(',', '')))
)

  h1         h2
0  a    1000.99
1  b  500000.00

或者假设您已导入数据框

df = pd.read_csv(
    'salaries-by-college-type.csv', sep='|'
)

然后使用pd.Series.str.replace

df.h2 = df.h2.str.replace('[^\d\.]', '').astype(float)

df

  h1         h2
0  a    1000.99
1  b  500000.00

或pd.DataFrame.replace

df.replace(dict(h2='[^\d\.]'), '', regex=True).astype(dict(h2=float))

  h1         h2
0  a    1000.99
1  b  500000.00

如何摆脱Python中列值的$符号

1 个答案: