我有两个数据框,其中一个包含列skill_id
和skill_set
,另一个包含jobtitle
和job_description
,如下所示
skills dataframe:
+---------+--------------------+
| skill_id| skill_set|
+---------+--------------------+
|100000001|python, numpy, pa...|
|100000002|java, j2ee, hiber...|
|100000003|c#, asp.net, .net...|
|100000004|agile, product ba...|
|100000005|deep-learning, py...|
|100000006|database, oracle,...|
|100000007|java, c, .net, da...|
|100000008|html, html5, java...|
|100000009|mongodb, expressj...|
|100000010|jira, confluence,...|
|100000011|automatic testing...|
|100000012|mvp, mvvm, sdk, a...|
|100000013|objective c, swif...|
|100000014|codeigniter, php,...|
+---------+--------------------+
descriptions dataframe:
+--------------------+--------------------+
| jobtitle| job_description|
+--------------------+--------------------+
|Python developer ...|this is tarannum ...|
|java developer in...|experience with j...|
|.net developer in...|design and develo...|
|scrum master in g...|leading one or mo...|
|data scientist fo...|must be proficien...|
|data base adminis...|strong 3+ year ex...|
|full stack develo...|12+ years of fron...|
|ui/ux developer i...|html5, css, javas...|
|mean stack develo...|hands on experien...|
|devops engineer i...|drive the archite...|
|testing engineer ...|seeking highly mo...|
|android developer...|functional knowle...|
|ios developer in ...|working knowledge...|
|ios developer in ...|We are looking fo...|
|python developer ...|Vast knowledge in...|
|Python Developer ...|We are looking fo...|
|Senior Java Devel...|We are looking fo...|
|php developer at ...|CodeIgniter (Must...|
+--------------------+--------------------+
现在,我想在技能数据框中输入一行,例如100000001,并与他们的skill_set
进行比较,并与所有职位描述数据框进行比较。应该显示描述数据帧的所有行,其中包含相交的单词为100000001行。我正在搜索如何使用PySpark在数据帧的一行与其他数据帧的所有行上应用交集。
希望它能理解。如果知道相同的型号,请提供示例链接。
谢谢