根据python中的模式提取字符串并将其写入pandas数据框列

时间:2018-11-05 05:02:04

标签: regex python-3.x pandas pattern-matching

我的数据集中有文本数据,如下所示

 Record    Note 1
  1       Amount: $43,385.23
          Mode: Air 
          LSP: Panalpina           
  2      Amount: $1,149.32
         Mode: Ocean  
         LSP: BDP
  3     Amount: $1,149.32
         LSP: BDP
         Mode: Road
  4     Amount: U$ 3,234.01
        Mode: Air   
  5     No details

我需要提取文本数据中的每个详细信息,并将它们写入新列,如下所示如何在python中进行操作

预期产量

Record   Amount         Mode   LSP
1         $43,385.23    Air    Panalpina 
2         $1,149.32     Ocean  BDP
3         $1,149.32     Road   BDP
4         $3,234.01     Air       
5

这可能吗?该怎么办

2 个答案:

答案 0 :(得分:0)

编写自定义函数,然后使用pd.apply()-

def parse_rec(x):
    note = x['Note']
    details = note.split('\n')
    x['Amount'] = None
    x['Mode'] = None
    x['LSP'] = None
    if len(details) > 1:
        for detail in details:
            if 'Amount' in detail:
                x['Amount'] = detail.split(':')[1].strip()
            if 'Mode' in detail:
                x['Mode'] = detail.split(':')[1].strip()
            if 'LSP' in detail:
                x['LSP'] = detail.split(':')[1].strip()
    return x

df = df.apply(parse_rec, axis=1)

答案 1 :(得分:0)

import re

Amount = []
Mode = []
LSP = []

def extract_info(txt):
    Amount_lst = re.findall(r"amounts?\s*:\s*(.*)", txt, re.I)
    Mode_lst = re.findall(r"Modes??\s*:\s*(.*)", txt, re.I)
    LSP_lst = re.findall(r"LSP\s*:\s*(.*)", txt, re.I)

    Amount.append(Amount_lst[0].strip() if Amount_lst else "No details")
    Mode.append(Mode_lst[0].strip() if Mode_lst else "No details")
    LSP.append(LSP_lst[0].strip() if LSP_lst else "No details")


df["Note"].apply(lambda x : extract_info(x))

df["Amount"] = Amount_lst
df["Mode"]= Mode_lst
df["LSP"]= LSP_lst

df = df[["Record","Amount","Mode","LSP"]]

通过使用正则表达式,我们可以提取诸如上述代码之类的信息,并记入单独的列中。