Question

我需要做的是这样的：

df[col].str.split(my_regexp, re.IGNORECASE, expand=True)

但是，熊猫DataFrame.str.split方法无法添加正则表达式标志。

由于我需要扩展结果，因此我无法做类似的事情

df.apply(lambda x: re.split(my_regexp, x[col], flags=re.IGNORECASE), axis=1, result='expand')

因为列表的长度不同。

我需要的是一种使re.split返回相同长度的所有列表，或者通过re.IGNORECASE方法中的Series.str.split的方式。还是更好的方法？

谢谢大家！

编辑：以下是一些更好的解释数据

series = pd.Series([
    'First paRt foo second part FOO third part',
    'test1 FoO test2', 
    'hi1 bar HI2',
    'This is a Test',
    'first baR second BAr third',
    'final'
])

应使用正则表达式r'foo|bar'

返回


    0               1               2
0   First paRt      second part     third part
1   test1           test2           None
2   hi1             HI2             None
3   This is a Test  None            None
4   first           second          third
5   final           None            None

Answer 1

方法1：如果需要保留小写/大写：

series.apply(lambda x: ', '.join(re.split(r'foo|bar', x, flags=re.IGNORECASE)))\
      .str.split(', ', expand=True)

输出

                0              1            2
0     First paRt    second part    third part
1          test1           test2         None
2            hi1             HI2         None
3  This is a Test           None         None
4          first         second         third
5           final           None         None

如果小写/大写不是问题的方法2

如评论中所述，使用str.lower()将您的系列广播到小写字母，然后使用str.split：

series.str.lower().str.split(r'foo|bar', expand=True)

输出

                0              1            2
0     first part    second part    third part
1          test1           test2         None
2            hi1             hi2         None
3  this is a test           None         None
4          first         second         third
5           final           None         None

方法3删除不必要的空格：

series.str.lower().str.split(r'foo|bar', expand=True).apply(lambda x: x.str.strip())

输出

                0            1           2
0      first part  second part  third part
1           test1        test2        None
2             hi1          hi2        None
3  this is a test         None        None
4           first       second       third
5           final         None        None

在熊猫中拆分字符串忽略大小写

1 个答案:

方法1：如果需要保留小写/大写：

如果小写/大写不是问题的方法2

方法3删除不必要的空格：