鉴于一系列成分:
text = """Ingredients: organic cane sugar, whole-wheat flour,
mono & diglycerides. Manufactured in a facility that uses nuts."""
如何从我的postgres数据库中提取成分,或在我的弹性搜索索引中找到它们,而不匹配Ingredients:
或nuts
等标记?
预期输出为:
ingredients = process(text)
# ['cane sugar', 'whole wheat flour', 'mono diglycerides']
答案 0 :(得分:0)
这个Python代码为我提供了这个输出:['organic cane sugar', 'whole-wheat flour', 'mono & diglycerides']
它要求成分在“成分:”之后,所有成分都列在“。”之前,如你的情况。
import re
text = """Ingredients: organic cane sugar, whole-wheat flour,
mono & diglycerides. Manufactured in a facility that uses nuts."""
# Search everything that comes after 'Ingredients: ' and before '.'
m = re.search('(?<=Ingredients: ).+?(?=\.)', text, re.DOTALL) # DOTALL: make . match newlines too
items = m.group(0).replace('\n', ' ').split(',') # Turn newlines into spaces, make a list of items separated by ','
items = [ i.strip() for i in items ] # Remove leading whitespace in each item
print items