拆分文本并根据文本创建新列

时间:2019-05-16 11:03:24

标签: python-3.x string pandas dataframe split

我有一个带有文本列的数据框,由于文本字符串包含多个变量,例如

,我想将其拆分为多列
df = pd.read_csv('C:/Users/mydata.csv')
print(df)
  my_msg
1 "Acct:XXXXXX0000 Debit:NGN2,000.00 Details:ATM CASH WITHDRAWAL"
2 "Acct:XXXXXX0000 Credit:NGN135,000.00 Details:BY UE Date:03-05-2019 10:03 
    Available Bal:NGN135,454.78 Enquiries:123456"
3 "Prepaid Card Alert **** POS  : Dr  NGN 4,052.50 Desc: 
   GTB/*******330/*****939 9000005600 NDate: 03-05-2019 09:36 Bal : NGN 
   506,265.00 FEEDBACK? Call 123456"
4 "Acct:XXXXXX0001 Debit:NGN300.00 Details:MOBILE BANKING300.00 
   IRTIME********7061 Date:03-05-2019 00:09 Available Bal:NGN373,358.56 
   Enquiries:12346"



I'm expecting an output in following way

Acct    Debit   Credit  Bal Pos
0000    2000    NA      NA  NA
0000    NA  135000   135454 NA
NA      NA      NA   506265 4052
0001    300     NA   373358 NA

1 个答案:

答案 0 :(得分:1)

使用正则表达式。

例如:

import pandas as pd

df = pd.DataFrame({"Col": ["Acct:XXXXXX0000 Debit:NGN2,000.00 Details:ATM CASH WITHDRAWAL",
                           "Acct:XXXXXX0000 Credit:NGN135,000.00 Details:BY UE Date:03-05-2019 10:03 Available Bal:NGN135,454.78 Enquiries:123456",
                           "Prepaid Card Alert **** POS  : Dr  NGN 4,052.50 Desc: GTB/*******330/*****939 9000005600 NDate: 03-05-2019 09:36 Bal : NGN 506,265.00 FEEDBACK? Call 123456",
                           "Acct:XXXXXX0001 Debit:NGN300.00 Details:MOBILE BANKING300.00 IRTIME********7061 Date:03-05-2019 00:09 Available Bal:NGN373,358.56 Enquiries:12346"
                           ]})

df["Acct"] = df["Col"].str.extract(r"Acct\s*:\s*XXXXXX(\d+)\s+")
df["Debit"] = df["Col"].str.extract(r"Debit\s*:\s*NGN\s*([0-9,\.]+)\s+")
df["Credit"] = df["Col"].str.extract(r"Credit\s*:\s*NGN\s*([0-9,\.]+)\s+")
df["Bal"] = df["Col"].str.extract(r"Bal\s*:\s*NGN\s*([0-9,\.]+)\s+")
df["Pos"] = df["Col"].str.extract(r"POS\s*:\s*Dr\s*NGN\s*([0-9,\.]+)\s+")

print(df)

输出:

                                                 Col  Acct     Debit  \
0  Acct:XXXXXX0000 Debit:NGN2,000.00 Details:ATM ...  0000  2,000.00   
1  Acct:XXXXXX0000 Credit:NGN135,000.00 Details:B...  0000       NaN   
2  Prepaid Card Alert **** POS  : Dr  NGN 4,052.5...   NaN       NaN   
3  Acct:XXXXXX0001 Debit:NGN300.00 Details:MOBILE...  0001    300.00   

       Credit         Bal       Pos  
0         NaN         NaN       NaN  
1  135,000.00  135,454.78       NaN  
2         NaN  506,265.00  4,052.50  
3         NaN  373,358.56       NaN