在Azure Databricks笔记本中,我尝试使用以下命令在blob存储中的某些csv上执行转换:
*import os
import glob
import pandas as pd
os.chdir(r'wasbs://dalefactorystorage.blob.core.windows.net/dale')
allFiles = glob.glob("*.csv") # match your csvs
for file in allFiles:
df = pd.read_csv(file)
df = df.iloc[4:,] # read from row 4 onwards.
df.to_csv(file)
print(f"{file} has removed rows 0-3")*
不幸的是,我遇到以下错误:
* FileNotFoundError:[错误2]没有此类文件或目录:'wasbs://dalefactorystorage.blob.core.windows.net/dale'
我错过了什么吗? (我对此完全陌生。)
干杯
戴尔
答案 0 :(得分:0)
如果要使用包pandas
从Azure blob读取CSV文件,请对其进行处理并写入
将此CSV文件复制到Azure Databricks中的Azure blob,我建议您将Azure blob存储安装为Databricks文件系统,然后执行此操作。有关更多详细信息,请参阅here。
例如
dbutils.fs.mount(
source = "wasbs://<container-name>@<storage-account-name>.blob.core.windows.net",
mount_point = "/mnt/<mount-name>",
extra_configs = {"fs.azure.account.key.<storage-account-name>.blob.core.windows.net":"<account access key>"})
import os
import glob
import pandas as pd
os.chdir(r'/dbfs/mnt/<mount-name>/<>')
allFiles = glob.glob("*.csv") # match your csvs
for file in allFiles:
print(f" The old content of file {file} : ")
df= pd.read_csv(file, header=None)
print(df)
df = df.iloc[4:,]
df.to_csv(file, index=False,header=False)
print(f" The new content of file {file} : ")
df= pd.read_csv(file,header=None)
print(df)
break