我正在尝试根据包含两列的db表对文章进行分类,如下所示:
id keywords
1 cat, kitten, tiger
2 dog, puppy, jackal
如果我有一篇文章,我如何确定哪些关键字出现在其中,以及我需要使用哪个ID来对文章进行分类?到目前为止,我的代码如下:
cur.execute("SELECT keywords, id FROM Keywords")
keywords = cur.fetchall()
keywords = [k[0] for k in keywords]
if any(word in article for word in keywords):
matched = [word for word in keywords if word in article]
print("Matched keywords: %s" % ', '.join(matched))
答案 0 :(得分:1)
如果关键字是以逗号分隔的关键字列表,则您希望拆分该字符串。尝试这样的事情:
cur.execute("SELECT keywords, id FROM Keywords")
result = cur.fetchall()
keywords = []
for row in result:
keywords += row[0].split(',')
if any(word in article for word in keywords):
matched = [word for word in keywords if word in article]
print("Matched keywords: %s" % ', '.join(matched))