Python Pandas解析记录

时间:2018-08-09 23:54:07

标签: python regex pandas

我需要解析数据框中的数据,以消除括号内的所有内容,然后将其移动到新列中。理想情况下,如果可以在新列中消除括号,那也很好,但是我认为这两种结果都会创建预期的解决方案:

current column                                  new column
/reports/industry(5315)/2018                    (5315)
/reports/limit/sector(139)/2017                 (139)
/reports/sector/region(147,189 and 132)/2018    (147,189 and 132)

谢谢,您能给出的任何方向都很好!

4 个答案:

答案 0 :(得分:2)

IIUC提取物

df.current.str.extract('.*\((.*)\).*',expand=True)
Out[785]: 
               0
0           5315
1            139
2147,189 and 132

答案 1 :(得分:1)

您可以使用正则表达式来做到这一点:

old_col = ['/reports/industry(5315)/2018', '/reports/limit/sector(139)/2017', '/reports/sector/region(147,189 and 132)/2018']
df = pd.DataFrame(old_col, columns=['current_column'])
df['new_column'] = df['current_column'].str.extract(r'\((.*)\)')

输出如下:

current_column                                       new_column
0   /reports/industry(5315)/2018                        5315
1   /reports/limit/sector(139)/2017                      139
2   /reports/sector/region(147,189 and 132)/2018    147,189 and 132

答案 2 :(得分:0)

>>> import re
>>> re.sub('.*(\(.*\)).*', '\\1', '/reports/industry(5315)/2018')
'(5315)'

完整示例

import pandas as pd
import re


old_col = ['/reports/industry(5315)/2018', '/reports/limit/sector(139)/2017', '/reports/sector/region(147,189 and 132)/2018']
df = pd.DataFrame(old_col, columns=['current_column'])


def grab_dat(x):
    dat = re.sub('.*(\(.*\)).*', '\\1', x)
    return(dat)


df['new_col'] =  df['current_column'].apply(grab_dat)

答案 3 :(得分:0)

使用正则表达式和熊猫str函数。

df['new_column'] = df['col'].str.extract(r'(?P<new_column>(?<=\().*(?=\)))', expand=False)

正则表达式说,寻找一个匹配任何东西的模式,使其以“(”开头并以“)”结尾,并将其放在名为“ new_column”的命名匹配组中