回复:删除括号中的字符串及其空格

时间:2019-08-15 09:30:23

标签: regex pandas apply data-cleaning

删除括号及其内容以及字符串中尾部空白的最佳re方法是什么?请注意,并非每个字符串的格式都一样。

脚本:

import pandas as pd
import re

df = pd.DataFrame({'name':
          ['University of Southampton (UK)', 
          'The College of William and Mary', 
          'University of Reading (UK)', 
          'Queensland University (Australia)']})

def cleaning(text):
    cleaned = re.findall(re.compile('^([^,]+).+'), text)
    cleaned = re.findall(re.compile('\(.*\)'), str(cleaned)) # Why do I have to str() here btw?
    return cleaned

df['name'].apply(lambda x: cleaning(x))

返回:

0    []
1    []
2    []
3    []

所需的输出(末尾没有空格):

0    University of Southampton
1    The College of William and Mary
2    University of Reading
3    Queensland University

感谢您的帮助!

1 个答案:

答案 0 :(得分:1)

仅适用于此特定情况,但您可以

df.name.str.split('\(',expand=True)[0].str.strip()