如何从Databricks笔记本上载文本文件到FTP

时间:2019-10-25 16:56:05

标签: pyspark databricks

我试图找到一种解决方案,但是什么也没有。我是新手,所以如果您知道解决方案,请帮助我。 谢谢!

2 个答案:

答案 0 :(得分:1)

Ok, I found a solution.

#copy file from ADLS to SFTP
from ftplib import FTP_TLS
from azure.datalake.store import core, lib, multithread
import pandas as pd

keyVaultName = "yourkeyvault"
#then you need to configure keyvault with ADLS

#set up authentification for ADLS
tenant_id = dbutils.secrets.get(scope = keyVaultName, key = "tenantId")
username = dbutils.secrets.get(scope = keyVaultName, key = "appRegID")
password = dbutils.secrets.get(scope = keyVaultName, key = "appRegSecret")
store_name = 'ADLSStoridge'
token = lib.auth(tenant_id = tenant_id, client_id = username, client_secret = password)
adl = core.AzureDLFileSystem(token, store_name=store_name)

#create secure connection with SFTP
ftp = FTP_TLS('ftp.xyz.com')
#add credentials
ftp.login(user='',passwd='') 
ftp.prot_p()
#set sftp directory path
ftp.cwd('folder path on FTP')

#load file
f = adl.open('adls path of your file')
#write to SFTP
ftp.storbinary('STOR myfile.csv', f)

答案 1 :(得分:0)

在Databricks中,您可以使用以下任意一种方法访问ADLS中存储的文件。 有三种访问Azure Data Lake Storage Gen2的方法:

  1. 使用服务主体和OAuth 2.0将Azure Data Lake Storage Gen2文件系统安装到DBFS。
  2. 直接使用服务主体。
  3. 直接使用Azure Data Lake Storage Gen2存储帐户访问密钥。

在文件系统中挂载和访问文件的步骤,就像它们是本地文件一样:

要在容器中安装Azure Data Lake Storage Gen2或文件夹,请使用以下命令:

语法:

configs = {"fs.azure.account.auth.type": "OAuth",
       "fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
       "fs.azure.account.oauth2.client.id": "<appId>",
       "fs.azure.account.oauth2.client.secret": "<password>",
       "fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/<tenant>/oauth2/token",
       "fs.azure.createRemoteFileSystemDuringInitialization": "true"}

dbutils.fs.mount(
source = "abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/folder1",
mount_point = "/mnt/flightdata",
extra_configs = configs)

示例:

enter image description here

安装ADLS之后,您可以像访问本地文件一样访问文件系统,例如:

df = spark.read.csv("/mnt/flightdata/flightdata.csv", header="true")
display(df)

示例:

enter image description here

参考: Databricks - Azure Data Lake Storage Gen2

希望这会有所帮助。