什么是DataBricks中的pandas.DataFrame.tail等效项

时间:2019-01-14 15:29:09

标签: python pandas databricks

DataBricks中的pandas.DataFrame.tail等效于什么?我在文档中进行了一些搜索,但未找到任何相关功能。

1 个答案:

答案 0 :(得分:1)

DataBricks显然使用的是pyspark.sql个数据帧,而不是pandas

# Index the df if you haven't already
# Note that monotonically increasing id has size limits
from pyspark.sql.functions import monotonically_increasing_id
df = df.withColumn("index", monotonically_increasing_id())

# Query with the index
tail = sqlContext.sql("""SELECT * FROM df ORDER BY index DESC limit 5""")
tail.show()

请注意,这很昂贵,无法发挥Spark的优势。

另请参阅:

https://medium.com/@chris_bour/6-differences-between-pandas-and-spark-dataframes-1380cec394d2

pyspark,spark: how to select last row and also how to access pyspark dataframe by index