我在Excel工作表中有一个客户服务电话记录。以下是我拥有的数据格式
So# Comments
1 sjhsh QUOTE 234566
1 sdsds customer call QUote 239876 Call back
2 adsdfh unknown call from customer QUOTE 189067 sdkjsd woieweio
3 QUOTE 657894 customer called for service
我正在从excel中读取此数据,需要在每行文本“ QUOTE”之后获取6位数字,然后将提取的数字添加为新列
1。行中可能有多个“ QUOTE”提及 2.这些行可能根本没有“ QUOTE”
有人可以帮助我使用python进行此子字符串搜索吗
import pandas as pd
import re
file=pd.read_excel("C:/Users/rkatta/Desktop/Book1.xlsx")
file.set_index('Index', inplace=True, drop=True)
comments=file['InternalComments']
quotenum=[]
keyword= 'QUOTE'
for i in comments:
try:
befor_keyowrd, keyword, after_keyword = comments[i].partition(keyword)
num=after_keyword[:6]
quotenum.append(num)
except AttributeError:
befor_keyowrd, keyword, after_keyword =''
quotenum.append(after_keyword)
答案 0 :(得分:2)
(?i)(?<=QUOTE )\d+
将捕获您要查找的数字。
(?i)
表示模式的其余部分不区分大小写,因此它将匹配“ QUote”和单词的任何变体形式。
(?<=QUOTE )
表示数字前面会加上引号和空格
\d+
是您的电话号码
答案 1 :(得分:1)
您需要用以下行替换列操作部分:
file['InternalComments'] = file['Comments'].str.findall(r'(?i)quote\s+(\d+)').apply(','.join)
请参见regex demo。
正则表达式匹配:
(?i)
-不区分大小写的模式quote
-一个quote
子字符串\s*
-超过0个空格(\d+)
-捕获组1(findall
返回的内容):超过1个数字。请参阅Python代码演示
from pandas import DataFrame
import pandas as pd
l = ['sjhsh QUOTE 234566', 'sdsds customer call QUote 239876 Call back', 'adsdfh unknown call from customer QUOTE 189067 sdkjsd woieweio', 'QUOTE 657894 customer called for service', 'QUOTE 657894 customer called for service QUOTE 657894 customer called for service', 'No qte']
file = pd.DataFrame(l, columns=['Comments'])
file['InternalComments'] = file['Comments'].str.findall(r'(?i)quote\s*(\d+)').apply(','.join)
file
Comments InternalComments
0 sjhsh QUOTE 234566 234566
1 sdsds customer call QUote 239876 Call back 239876
2 adsdfh unknown call from customer QUOTE 189067... 189067
3 QUOTE 657894 customer called for service 657894
4 QUOTE 657894 customer called for service QUOTE... 657894,657894
5 No qte