我的数据集中有文本数据,如下所示
Record Note 1
1 Amount: $43,385.23
Mode: Air
LSP: Panalpina
2 Amount: $1,149.32
Mode: Ocean
LSP: BDP
3 Amount: $1,149.32
LSP: BDP
Mode: Road
4 Amount: U$ 3,234.01
Mode: Air
5 No details
我需要提取文本数据中的每个详细信息,并将它们写入新列,如下所示如何在python中进行操作
预期产量
Record Amount Mode LSP
1 $43,385.23 Air Panalpina
2 $1,149.32 Ocean BDP
3 $1,149.32 Road BDP
4 $3,234.01 Air
5
这可能吗?该怎么办
答案 0 :(得分:0)
编写自定义函数,然后使用pd.apply()
-
def parse_rec(x):
note = x['Note']
details = note.split('\n')
x['Amount'] = None
x['Mode'] = None
x['LSP'] = None
if len(details) > 1:
for detail in details:
if 'Amount' in detail:
x['Amount'] = detail.split(':')[1].strip()
if 'Mode' in detail:
x['Mode'] = detail.split(':')[1].strip()
if 'LSP' in detail:
x['LSP'] = detail.split(':')[1].strip()
return x
df = df.apply(parse_rec, axis=1)
答案 1 :(得分:0)
import re
Amount = []
Mode = []
LSP = []
def extract_info(txt):
Amount_lst = re.findall(r"amounts?\s*:\s*(.*)", txt, re.I)
Mode_lst = re.findall(r"Modes??\s*:\s*(.*)", txt, re.I)
LSP_lst = re.findall(r"LSP\s*:\s*(.*)", txt, re.I)
Amount.append(Amount_lst[0].strip() if Amount_lst else "No details")
Mode.append(Mode_lst[0].strip() if Mode_lst else "No details")
LSP.append(LSP_lst[0].strip() if LSP_lst else "No details")
df["Note"].apply(lambda x : extract_info(x))
df["Amount"] = Amount_lst
df["Mode"]= Mode_lst
df["LSP"]= LSP_lst
df = df[["Record","Amount","Mode","LSP"]]
通过使用正则表达式,我们可以提取诸如上述代码之类的信息,并记入单独的列中。