scala - 如何使用Spark查找两个hadoop表中的关键字？

我在HDFS中有两个表。一个表（表-1）有一些关键字，如下所示。另一个表（表-2）有一个文本列。表-1中的每一行都可以有多个关键字。我需要找到Table-1中表2中文本列的所有匹配关键字，并输出表-2中每一行的关键字列表。

示例：

表1：

def get_list_of_university_towns():
    ....
    uni_towns['State'] = uni_towns['State'].apply(lambda item: item.replace('[edit]', ''))
    return uni_towns

表2：

ID  | Name    | Age | City | Gender
---------------------------------
111 | Micheal | 19  | NY   | male   
222 | George  | 23  | CA   | male
333 | Linda   | 22  | LA   | female

输出：

Text_Description
------------------------------------------------------------------------
1-Linda and my cousin left the house.  
2-Michael who is 19 year old, and George are going to rock concert in CA.
3-Shopping card is ready at the NY for male persons.

如何使用Spark查找两个hadoop表中的关键字？

0 个答案: