Question

我正在使用字符串替换方法来清理列名。

df.columns=df.columns.str.replace("#$%./- ","").str.replace(' ', '_').str.replace('.', '_').str.replace('(','').str.replace(')','').str.replace('.','').str.lower()

虽然它有效但肯定看起来不像pythonic。有什么建议吗？如果需要作为列名，我只需要A-Za-z和下划线_。

更新

我尝试在第一个替换方法中使用正则表达式，但我仍然需要像这样链接字符串......

terms.columns=terms.columns.str.replace(r"^[^a-zA-Z1-9]*", '').str.replace(' ', '_').str.replace('(','').str.replace(')','').str.replace('.', '').str.replace(',', '')

更新显示测试数据：

原始字符串（制表符分隔）：

[Sr.No. Course  Terms   Besic of Education  Degree Course   Course Approving Authority (i.e Medical Council, etc.)  Full form of Course 1 year Duration 2nd year    3rd year Duration   4 th year Duration]

更改列名称：

terms.columns=terms.columns.str.replace(r"^[^a-zA-Z1-9]*", '').str.replace(' ', '_').str.replace('(','').str.replace(')','').str.replace('.', '').str.replace(',', '').str.lower()

输出：

['srno', 'course', 'terms', 'besic_of_education', 'degree_course',
       'course_approving_authority_ie_medical_council_etc',
       'full_form_of_course', '1_year_duration', '2nd_year_',
       '3rd_year_duration', '4_th_year_duration']

以上输出是正确的。问题：除了我使用的方式之外，有没有办法实现相同的目标？

Answer 1

您可以使用较少数量的.replace操作，方法是将非字符串替换为空字符串，然后使用下划线删除空白字符。

df.columns.str.replace("[^\w\s]+","").str.replace("\s+","_")‌.str.lower()

我希望这会有所帮助。

字符串替换方法将被正则表达式

1 个答案: