final_closedf = sqlContext.sql('select df_table.tickerData.4_close,df_table.CIK,df_table.Company from df_table')
#final_closedf.show()
final_closedf.select('4_close').rdd.max()[0]
# final_closedf.groupby().max('4_close').collect()[0].asDict()['max(A)']
final_closedf.show()
final_closedf = sqlContext.sql('select df_table.tickerData.4_close,df_table.CIK,df_table.Company from df_table')
#final_closedf.show()
final_closedf.select('4_close').rdd.max()[0]
# final_closedf.groupby().max('4_close').collect()[0].asDict()['max(A)']
final_closedf.show()
在pyspark数据框中获取唯一公司名称的最高和最低价值
| 4_close| CIK| Company|
+--------+------+-------------------+
| 98.5900|104169|Wal Mart Stores Inc|
| 99.4500|104169|Wal Mart Stores Inc|
| 99.5400|104169|Wal Mart Stores Inc|
|100.1300|104169|Wal Mart Stores Inc|
|101.6100|104169|amazon inc |
|100.3900|104169|Wal Mart Stores Inc|
| 99.6700|104169|Berkshire Inc |
|100.0200|104169|amazon inc |
我想获得唯一公司具有最高和最低“ 4_close”值的数据框(给出了样本df)