我在HDFS中有两个表。一个表(表-1)有一些关键字,如下所示。另一个表(表-2)有一个文本列。表-1中的每一行都可以有多个关键字。我需要找到Table-1中表2中文本列的所有匹配关键字,并输出表-2中每一行的关键字列表。
示例:
表1:
def get_list_of_university_towns():
....
uni_towns['State'] = uni_towns['State'].apply(lambda item: item.replace('[edit]', ''))
return uni_towns
表2:
ID | Name | Age | City | Gender
---------------------------------
111 | Micheal | 19 | NY | male
222 | George | 23 | CA | male
333 | Linda | 22 | LA | female
输出:
Text_Description
------------------------------------------------------------------------
1-Linda and my cousin left the house.
2-Michael who is 19 year old, and George are going to rock concert in CA.
3-Shopping card is ready at the NY for male persons.