Question

我正在使用Python的Pandas导入数据集，不幸的是需要进行一些清理。导入后，我需要删除两列（alpha2和alpha3）中的所有引号和空格。这是我目前的工作方式：

# Add alpha2 country codes to custom dataset to normalize data
country_codes = pd.read_csv('datasets/country_codes.csv').rename(columns = {'Alpha-2 code':'alpha2', 'Alpha-3 code':'alpha3'})
# Remove commas and spaces from dataset
country_codes['alpha2'] = country_codes['alpha2'].str.replace('"', '')
country_codes['alpha2'] = country_codes['alpha2'].str.replace(' ', '')
country_codes['alpha3'] = country_codes['alpha3'].str.replace('"', '')
country_codes['alpha3'] = country_codes['alpha3'].str.replace(' ', '')

在我看来，这有点难看，因为我需要5个规则来处理一些简单的命令。用更少的代码可以更有效地完成这项工作吗？

Answer 1

您可以df.replace使用regex，如下所示：

country_codes[['alpha2', 'alpha3']].replace(r'"|\s','', 
                                                regex=True,
                                                inplace=True)

完整代码如下所示：

country_codes = pd.read_csv('datasets/country_codes.csv').rename(columns = {'Alpha-2 code': 'alpha2', 'Alpha-3 code':'alpha3'})
country_codes[['alpha2', 'alpha3']].replace(r'"|\s','', 
                                            regex=True,
                                            inplace=True)

但是，正如@Jeff在下面的评论中提到的那样，最好不要使用inplace=True，而是可以这样做：

country_codes[['alpha2', 'alpha3']] = country_codes[['alpha2', 'alpha3']].replace(r'"|\s','', 
                                                regex=True)

有关详细信息，请参阅文档here。

使用pandas导入CSV文件时有效清理数据

1 个答案: