将数据从cloudera hdfs复制到azure blob存储

时间:2018-03-08 20:26:00

标签: azure hadoop cloudera-cdh

在cdh 5.10.2中,我们需要将数据从hdfs复制到azure,但是我们在放置文件时遇到了问题。

  • 配置azure帐户并测试azure存储资源管理器的访问权限。
  • 我们使用凭据(帐户+密钥)配置core-site.xml并重新启动。
  • 我们测试命令distcp但出现以下错误:

    hadoop distcp /user/myuser/file1.txt wasb://cont1@testblobsAccount1.blob.core.windows.net/folder1/ -log / usr / myuser /

  

18/03/08 20:20:59 INFO tools.DistCp:输入选项:DistCpOptions {atomicCommit = false,syncFolder = false,deleteMissing = false,ignoreFailures = false,overwrite = false,append = false,useDiff = false ,useRdiff = false,fromSnapshot = null,toSnapshot = null,skipCRC = false,blocking = true,numListstatusThreads = 0,maxMaps = 20,mapBandwidth = 100,sslConfigurationFile ='null',copyStrategy ='uniformsize',preserveStatus = [], preserveRawXattrs = false,atomicWorkPath = null,logPath = null,sourceFileListing = null,sourcePaths = [/ user / myuser / file1.txt,wasb://cont1@testblobsAccount1.blob.core.windows.net/folder1,-log], targetPath = / usr / myuser,targetPathExists = false,filtersFile ='null'}   18/03/08 20:20:59 INFO client.RMProxy:连接到ResourceManager,位于xxxx.xxxx.test / 1.1.1.1:8032   18/03/08 20:20:59 WARN impl.MetricsConfig:找不到配置:试过hadoop-metrics2-azure-file-system.properties,hadoop-metrics2.properties   18/03/08 20:20:59 INFO impl.MetricsSystemImpl:计划的快照周期为10秒。   18/03/08 20:20:59 INFO impl.MetricsSystemImpl:azure-file-system metrics系统已启动   18/03/08 20:21:03错误tools.DistCp:遇到异常   org.apache.hadoop.fs.azure.AzureException:com.microsoft.windowsazure.storage.StorageException:其中一个HTTP标头的值格式不正确。           在org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.retrieveMetadata(AzureNativeFileSystemStore.java:1907)           在org.apache.hadoop.fs.azure.NativeAzureFileSystem.getFileStatus(NativeAzureFileSystem.java:1587)           在org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:64)           在org.apache.hadoop.fs.Globber.doGlob(Globber.java:272)           在org.apache.hadoop.fs.Globber.glob(Globber.java:151)           在org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1703)           在org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)           在org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:86)           在org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:377)           在org.apache.hadoop.tools.DistCp.prepareFileListing(DistCp.java:90)           在org.apache.hadoop.tools.DistCp.execute(DistCp.java:179)           在org.apache.hadoop.tools.DistCp.run(DistCp.java:141)           在org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)           在org.apache.hadoop.tools.DistCp.main(DistCp.java:441)   引起:com.microsoft.windowsazure.storage.StorageException:其中一个HTTP标头的值格式不正确。           在com.microsoft.windowsazure.storage.StorageException.translateFromHttpStatus(StorageException.java:212)           在com.microsoft.windowsazure.storage.StorageException.translateException(StorageException.java:173)           在com.microsoft.windowsazure.storage.core.StorageRequest.materializeException(StorageRequest.java:306)           在com.microsoft.windowsazure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:229)           在com.microsoft.windowsazure.storage.blob.CloudBlobContainer.downloadAttributes(CloudBlobContainer.java:516)           at org.apache.hadoop.fs.azure.StorageInterfaceImpl $ CloudBlobContainerWrapperImpl.downloadAttributes(StorageInterfaceImpl.java:233)           在org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.checkContainer(AzureNativeFileSystemStore.java:1091)           在org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.retrieveMetadata(AzureNativeFileSystemStore.java:1823)

0 个答案:

没有答案