Question

我以以下数据框为例：

test = pd.DataFrame({'type':['fruit-of the-loom (sometimes-never)', 'yes', 'ok (not-possible) I will try', 'vegetable', 'poultry', 'poultry'],
                 'item':['apple', 'orange', 'spinach', 'potato', 'chicken', 'turkey']})

我发现有很多人想要从字符串或类似情况中删除括号，但就我而言，我想保留字符串原样，除了我想删除字符串中的hyphen字符串的括号。

有人对我如何实现这一目标有建议吗？

str.split()会处理连字符（如果在前，而连字符str.rsplit()在后）。我想不出一种办法来搞这个。

在这种情况下，此假设列中的值的理想结果将是：

'fruit-of the-loom (sometimes never)',
'yes', 
'ok (not possible) I will try', 
'vegetable', 
'poultry', 
'poultry'`

Answer 1

一种方法可能是将str.replace与用于查找括号之间内容的模式一起使用，而replace参数可以是在匹配对象上使用flutter pub upgrade的lambda：

replace

可以here找到print (test['type'].str.replace(pat='\((.*?)\)', repl=lambda x: x.group(0).replace('-',' '))) 0 fruit-of the-loom (sometimes never) 1 yes 2 ok (not possible) I will try 3 vegetable 4 poultry 5 poultry Name: type, dtype: object中的内容

Answer 2

test.type = (test.type.str.extract('(.*?\(.*?)-(.*?\))(.*)')
             .sum(1)
             .combine_first(test.type))

说明：

提取beginning until parenthesis and then hyphen和after hyphen until parenthesis and then optional additional stuff的正则表达式组
再次将它们与sum串联在一起
NaN在其中使用原始（combine_first）中的值

通过这种方式可以删除连字符，而不用空格代替。如果需要空格，可以使用apply代替sum：

test.type = (test.type.str.extract('(.*?\(.*?)-(.*?\))(.*)')
             .apply(lambda row: ' '.join(row.values.astype(str)), axis=1)
             .combine_first(test.type))

无论哪种方式，都不能使用多于一组的括号。

Answer 3

我应该花点时间考虑一下。

这是我想出的解决方案”

计算括号，替换为正确的计数

def inside_parens(string):
    parens_count = 0
    return_string = ""
    for a in string:
        if a == "(":
            parens_count += 1
        elif a == ")":
            parens_count -= 1
        if parens_count > 0:
            return_string += a.replace('-', ' ')
        else:
            return_string += a
    return return_string


    return return_string

完成此操作后，将其应用于预期的列：

df['col_1'] = df['col_1'].apply(inside_parens)

如果要泛化该功能，实际上可以通过传递要替换的内容并使它更通用。

如果包含在括号中，则替换列的字符串值

3 个答案: