从spark数据框中获取最高和最低值以获得唯一值

时间:2019-01-31 06:05:25

标签: python-3.x

final_closedf = sqlContext.sql('select df_table.tickerData.4_close,df_table.CIK,df_table.Company from df_table')
#final_closedf.show()
final_closedf.select('4_close').rdd.max()[0]
# final_closedf.groupby().max('4_close').collect()[0].asDict()['max(A)']
final_closedf.show()

final_closedf = sqlContext.sql('select df_table.tickerData.4_close,df_table.CIK,df_table.Company from df_table')
#final_closedf.show()
final_closedf.select('4_close').rdd.max()[0]
# final_closedf.groupby().max('4_close').collect()[0].asDict()['max(A)']
final_closedf.show()

在pyspark数据框中获取唯一公司名称的最高和最低价值

| 4_close|   CIK|            Company|
+--------+------+-------------------+
| 98.5900|104169|Wal Mart Stores Inc|
| 99.4500|104169|Wal Mart Stores Inc|
| 99.5400|104169|Wal Mart Stores Inc|
|100.1300|104169|Wal Mart Stores Inc|
|101.6100|104169|amazon inc         |
|100.3900|104169|Wal Mart Stores Inc|
| 99.6700|104169|Berkshire Inc      |
|100.0200|104169|amazon inc         |

我想获得唯一公司具有最高和最低“ 4_close”值的数据框(给出了样本df)

0 个答案:

没有答案