答案 0 :(得分:1)
我们可以使用 Python 代码获取这些详细信息,因为我们没有直接的方法来获取数据湖中文件的修改时间和日期
这是代码
from pyspark.sql.functions import col
from azure.storage.blob import BlockBlobService
from datetime import datetime
import os.path
block_blob_service = BlockBlobService(account_name='account-name', account_key='account-key')
container_name ='container-firstname'
second_conatainer_name ='container-Second'
#block_blob_service.create_container(container_name)
generator = block_blob_service.list_blobs(container_name,prefix="Recovery/")
report_time = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
for blob in generator:
length = BlockBlobService.get_blob_properties(block_blob_service,container_name,blob.name).properties.content_length
last_modified = BlockBlobService.get_blob_properties(block_blob_service,container_name,blob.name).properties.last_modified
file_size = BlockBlobService.get_blob_properties(block_blob_service,container_name,blob.name).properties.content_length
line = container_name+'|'+second_conatainer_name+'|'+blob.name+'|'+ str(file_size) +'|'+str(last_modified)+'|'+str(report_time)
print(line)
有关详细信息,请参阅解决类似问题的 SO 主题。
答案 1 :(得分:1)
关于问题,请参考以下代码
URI = sc._gateway.jvm.java.net.URI
Path = sc._gateway.jvm.org.apache.hadoop.fs.Path
FileSystem = sc._gateway.jvm.org.apache.hadoop.fs.FileSystem
conf = sc._jsc.hadoopConfiguration()
conf.set(
"fs.azure.account.key.<account-name>.dfs.core.windows.net",
"<account-access-key>")
fs = Path('abfss://<container-name>@<account-name>.dfs.core.windows.net/<file-path>/').getFileSystem(sc._jsc.hadoopConfiguration())
status=fs.listStatus(Path('abfss://<container-name>@<account-name>.dfs.core.windows.net/<file-path>/'))
for i in status:
print(i)
print(i.getModificationTime())