Is there a way in python to limit dbutils.fs.ls to top n

时间:2018-12-03 13:00:45

标签: pyspark databricks apache-commons-dbutils azure-databricks

Is there a way to give a limit to the ls function. I know you could read and then limit the number of entries but that doesn't solve my problem. problem is that when our service is out in the weekend it can happen that there are over 2000K files in the blob storage. We have chosen to process them in batches of 50000 so that we see some movement. But the ls takes easily up to 30 minutes and has in the beginning too much files that we don't process. So putting a limit to ls would help us .. We don't care about order, just don't want the batch size to be too big.

So any suggestion is welcome.

0 个答案:

没有答案