使用正则表达式拆分数据框列

时间:2018-12-12 06:50:04

标签: python regex pandas

我正在尝试拆分具有特定定界符的列,例如:'|'。

我的数据看起来像这样,我只有一个名为“ ID”的列,其中包含要基于定界符“ |”分割的那些字符串。

func point(inside point: CGPoint, 
      with event: UIEvent?) -> Bool

我尝试了2种不同的方法:

  1. ID accountsummary            | Name: Report Suite Totals 
    
    ID activity                  | Name: Activity 
    

哪个给我以下错误:dataframe_elements_int[['ID', 'Name']] = \ dataframe_elements_int['ID'].str.rsplit('|', expand=True, n=1)

  1. ValueError: Columns must be same length as key

哪个给我以下错误:dataframe_final[['Id','Name']] = \ dataframe_elements_int['ID'].str.extract('(\w*)\|(\w*)', expand=True)

2 个答案:

答案 0 :(得分:1)

您可以尝试

df=dataframe_elements_int
#split the column
df['new_ID'], df['Name'] = df['ID'].str.split('|').str
#filtering the Name and ID 
df['Name']=df['Name'].str.extract(r'((?<=Name:).*$)', expand=True)
df['new_ID']=df['new_ID'].str.extract(r'((?<=ID).*$)',expand=True)

答案 1 :(得分:1)

您可以使用以下正则表达式:

ID\s+(\w+)\s+|\s+Name:\s+(.*)$

如果要使用提取,请执行以下操作:

import pandas as pd

df = pd.DataFrame(data=["ID accountsummary            | Name: Report Suite Totals",
                        "ID activity                  | Name: Activity"], columns=["ID"])
pattern = r"ID\s+(?P<IDnew>\w+)"
df["NewId"] = df["ID"].str.extract(pattern)
pattern = r"Name:\s+(?P<Name>.*)$"
df["Name"] = df["ID"].str.extract(pattern)
df.drop(["ID"], axis=1, inplace=True)
df.rename({"NewId": "ID"})