答案 0 :(得分:0)
您可以首先refer explode
标记人专栏。之后,将其与基于tag_person,Artist和year列的标题一起加入,然后过滤掉Artist为空的行,将为您提供结果数据。
from pyspark.sql.functions import explode
all_posts = all_posts.select(all_posts.Artist, explode(all_posts.tagged_persons))
cond = [all_posts.tagged_persons == headliners.Artist, all_posts.year == headliners.Year]
join_df = all_posts.join(headliners, cond, 'left')
filter_df = join_df.filter(col("Artist").isNotNull())