我是pyspark的新手,下面是tabe,我想绘制此df的直方图,x轴将包括“ word”,轴将包括“ count”列。你有什么主意吗?
word count
Akdeniz’in 14
en 13287
büyük 3168
deniz 1276
festivali: 6
答案 0 :(得分:1)
首先,直方图不是可视化单词计数的正确图表类型。直方图可用于可视化变量的分布,相反,条形图用于比较变量(有关更多信息,请阅读此article)。使用以下代码,您可以为示例创建条形图:
from matplotlib import pyplot
l = [( 'Akdeniz’in', 14)
,('en' , 13287)
,('büyük' , 3168)
,('deniz' , 1276)
,('festivali:' , 6)]
df = spark.createDataFrame(l,['word','count'])
#Add values to a list (not recommend when you have a huge dataframe)
bla = df.collect()
#create a numeric value for every label
indexes = list(range(len(bla)))
#split words and counts to different lists
values = [r['count'] for r in bla]
labels = [r['word'] for r in bla]
#Plotting
bar_width = 0.35
pyplot.bar(indexes, values)
#add labels
labelidx = [i + bar_width for i in indexes]
pyplot.xticks(labelidx, labels)
pyplot.show()