我有一个单词字典,其频率如下。
Double legacyRow = row.getCell(col).getNumericCellValue();
String legacyRowStr = legacyRow.toString();
if(legacyRowStr.contains(".0")){
legacyRowStr = legacyRowStr.substring(0, legacyRowStr.length()-2);
}
我有一组字符串(删除标点符号),如下所示。
mydictionary = {'yummy tim tam':3, 'milk':2, 'chocolates':5, 'biscuit pudding':3, 'sugar':2}
在上面的字符串中,我需要通过引用字典输出“饼干布丁”,“美味的蒂姆塔姆”和“牛奶”。不是糖,因为它的原料是在绳子里。
但是,我目前使用的代码也输出了糖。
recipes_book = "For todays lesson we will show you how to make biscuit pudding using
yummy tim tam milk and rawsugar"
如何避免使用这样的子字符串,只考虑一个完整的标记,如“牛奶”。请帮帮我。
答案 0 :(得分:1)
使用字边界' \ b'。简单来说就是
recipes_book = "For todays lesson we will show you how to make biscuit pudding using
yummy tim tam milk and rawsugar"
>>> re.findall(r'(?is)(\bchocolates\b|\bbiscuit pudding\b|\bsugar\b|\byummy tim tam\b|\bmilk\b)',recipes_book)
['biscuit pudding', 'yummy tim tam', 'milk']
答案 1 :(得分:0)
您可以使用正则表达式字边界更新代码:
mydictionary = {'yummy tim tam':3, 'milk':2, 'chocolates':5, 'biscuit pudding':3, 'sugar':2}
recipes_book = "For today's lesson we will show you how to make biscuit pudding using yummy tim tam milk and rawsugar"
searcher = re.compile(r'{}'.format("|".join(map(lambda x: r'\b{}\b'.format(x), mydictionary.keys()))), flags=re.I | re.S)
for match in searcher.findall(recipes_book):
print(match)
输出:
biscuit pudding
yummy tim tam
milk
答案 2 :(得分:0)
使用re.escape
的另一种方法。
有关re.escape here !!!
import re
mydictionary = {'yummy tim tam':3, 'milk':2, 'chocolates':5, 'biscuit pudding':3, 'sugar':2}
recipes_book = "For today's lesson we will show you how to make biscuit pudding using yummy tim tam milk and rawsugar"
val_list = []
for i in mydictionary.keys():
tmp_list = []
regex_tmp = r'\b'+re.escape(str(i))+r'\b'
tmp_list = re.findall(regex_tmp,recipes_book)
val_list.extend(tmp_list)
print val_list
<强>输出:强>
"C:\Program Files (x86)\Python27\python.exe" C:/Users/punddin/PycharmProjects/demo/demo.py
['yummy tim tam', 'biscuit pudding', 'milk']