我有一个超过140列的.DAT文件。我想在HIVE中创建一个表,然后将该.DAT文件的数据导入该表中? 如何读取该.DAT文件的架构?文件位于我的cloudera VM的HDFS中。
是否可以在不提供表模式的情况下将数据文件导入hdfs?
答案 0 :(得分:0)
Does the .dat file have column headers? If not, you can run simple script that counts columns in file (this assumes comma delimited file but can change the F to other delimiters )
#hdfs
numcols=$( hadoop fs -cat my.DAT | awk -F"," '{ print NF }' | sort | uniq | sort -n -r | head -1 )
#local
#numcols=$(awk -F"," '{ print NF }' my.DAT | sort | uniq | sort -n -r | head -1 )
echo "create external table mydat(col1 STRING" > myddl.sql
for (( i = 2; i <= $numcols; i++ ))
do
echo ",col${i} STRING" >> myddl.sql
done
echo ") output location xyz;" >> myddl.sql