如何在hadoop hdfs

时间:2016-06-03 12:57:20

标签: hdfs hadoop2 hadoop-streaming hdinsight

我使用Azure blob存储作为我的hadoop hdfs文件目录。当我使用命令提示符将输入文件从本地计算机传递到Azure blob存储时,内容类型将更改为“application / octet-stream”,从而使其不可分割。

如何更改hdfs文件的编码?

1 个答案:

答案 0 :(得分:0)

您可以使用Azure-CLI将Blob上载到Azure存储中。如果您有.txt文件,则此脚本会将文件上载为text / plain内容类型。

SELECT Question,
       COUNT(CASE WHEN data = 'Strongly Agree' THEN 1 END)    as `Strongly Agree`,
       COUNT(CASE WHEN data = 'Agree' THEN 1 END)             as `Agree`,
       COUNT(CASE WHEN data = 'Disagree' THEN 1 END)          as `Disagree`,
       COUNT(CASE WHEN data = 'Strongly Disagree' THEN 1 END) as `Strongly Disagree`
FROM (       
        SELECT Q.Question, A.choice, p.`data`
        FROM `Questions` Q
        CROSS JOIN `Answers` A
        LEFT JOIN (
                    select c.col,
                           case c.col
                             when 'question1' then question1
                             when 'question2' then question2
                             when 'question3' then question3
                             when 'question4' then question4
                           end as `data` 
                    from yourTable t     
                    cross join
                    (
                      select 'question1' as col
                      union all select 'question2'
                      union all select 'question3'
                      union all select 'question4'
                    ) c
                  ) P
              ON A.`choice` = p.`data` 
             AND Q.Question = P.`col`
     ) R           
GROUP BY  Question;

无法在上传过程中更改内容类型。如果您使用的是其他文件类型,那么您需要在上传之前转换文件,或者在上传后转换blob。

资源:

Azure Docs - CLI Install

Azure Docs - How to upload