我正在尝试通过从笔记中提取信息来创建一个pandas数据框。我想得到一些专栏
cuddly-slider:
version: 1.x
css:
theme:
css/cuddly-slider.css: {}
js:
js/cuddly-slider.js: {}
注意:
phonenumber | status | result | notation
(999) 555-9898 Partial Generic VM VOICE MAIL LEFT
我会制作第二个数据帧,我会尝试在单个引号中拉出过程事件的单个单词。
Event Notation
Call Call to (Home) (999) 555-9898 ended. Partial – Generic VM --> - VOICE MAIL LEFT
Call Call to (Work) (999) 555-9898 ended. Partial - Voice Mail, No Message left -->
Call Call to (Work) (999) 555-9898 ended. Positive – Spoke to Receptionist -->
Call Call to (Mobile) (999) 555-9898 ended. Partial – Generic VM --> - Unable to reach customer, voice message left and text sent
Procedure Procedure 'Verify' is checked
Procedure Procedure 'Duplicate Check' is checked
Procedure Procedure 'Check Something' is checked
Procedure Procedure 'Scenario' is checked
Procedure Procedure 'Attempt' is checked
答案 0 :(得分:2)
为了给你一个想法,这里可能有些事情开始(但是,请注意,这是我第一次使用正则表达式):
import re
data = []
with open('notes.txt', 'r') as f:
next(f)
for line in f:
data.append(line.strip('\n'))
data
['Call Call to (Home) (999) 555-9898 ended. Partial – Generic VM --> - VOICE MAIL LEFT ',
'Call Call to (Work) (999) 555-9898 ended. Partial - Voice Mail, No Message left -->',
'Call Call to (Work) (999) 555-9898 ended. Positive – Spoke to Receptionist --> ',
'Call Call to (Mobile) (999) 555-9898 ended. Partial – Generic VM --> - Unable to reach customer, voice message left and text sent',
"Procedure Procedure 'Verify' is checked",
"Procedure Procedure 'Duplicate Check' is checked",
"Procedure Procedure 'Check Something' is checked",
"Procedure Procedure 'Scenario' is checked",
"Procedure Procedure 'Attempt' is checked"]
phone = []
status = []
for line in data:
tmp = line.split(' ')
if tmp[0] == 'Call':
p_phone = re.compile('[(]\d{3}[)] \d{3}-\d{4}')
p_status = re.compile('Partial|Positive')
phone.append(p_phone.findall(line))
status.append(p_status.findall(line))
elif tmp[0] == "Procedure":
pass
print(phone)
print(status)
[['(999) 555-9898'], ['(999) 555-9898'], ['(999) 555-9898'], ['(999) 555-9898']]
[['Partial'], ['Partial'], ['Positive'], ['Partial']]