如何使用pyspark进行绘图?

时间:2018-10-22 23:05:17

标签: python dataframe pyspark

我需要绘制两个独立的列:第一个代表数据,第二个代表时间:

All_packets= df.select("ip_adr_src","asn_val","timestamp")
EB_packets=All_packets.filter("asn_val is not NULL")
EB_packets.show()
plotdf=EB_packets.select("asn_val","timestamp")

我想按ans_val绘制ip_adr_src per time组。 如果我有6条ip_adr_src,则我希望有6条曲线。

+--------------------+---------------------------------+-------------+
|     ip_adr_src     |asn_val                          |    timestamp|
+--------------------+---------------------------------+-------------+
|14:15:92:cc:00:01...|                              707|1539071748441|
|14:15:92:cc:00:02...|                             1212|1539071752314|
|14:15:92:cc:00:00...|                             1616|1539071755578|
|14:15:92:cc:00:04...|                             1818|1539071757167|
|14:15:92:cc:00:03...|                             2020|1539071759297|
|14:15:92:cc:00:00...|                             2121|1539071760408|
|14:15:92:cc:00:09...|                             2323|1539071764035|
|14:15:92:cc:00:07...|                             2424|1539071765775|
|14:15:92:cc:00:00...|                             2525|1539071768560|
|14:15:92:cc:00:06...|                             5858|1539071845370|
|14:15:92:cc:00:00...|                             6060|1539071850129|
|14:15:92:cc:00:05...|                             6262|1539071855046|
|14:15:92:cc:00:00...|                             6969|1539071872523|
|14:15:92:cc:00:07...|                             6969|1539071872528|
|14:15:92:cc:00:08...|                             7171|1539071877609|

但是,我所有的测试都是错误的,并且我有此错误:

Dataframe doesn't have an object `'plot'`

如果您能帮助我,我将不胜感激。

1 个答案:

答案 0 :(得分:0)

我不确定我是否了解要绘制的列,但我怀疑您需要有关如何绘制的帮助。这就是我将ans_val列与timestamp列相对应的方式:

import matplotlib.pyplot as plt

y_ans_val = [val.ans_val for val in df.select('ans_val').collect()]
x_ts = [val.timestamp for val in df.select('timestamp').collect()]

plt.plot(x_ts, y_ans_val)

plt.ylabel('ans_val')
plt.xlabel('timestamp')
plt.title('ASN values for time')
plt.legend(['asn_val'], loc='upper left')

plt.show()

如果需要绘制其他列,请多次调用plt.plot(x,y)命令,然后在plt.legend(your_cols, loc='upper left')函数中添加每个名称。