我有一个原始表,在YEAR,MONTH和DATE分区。例如:
echo "What you want to input numbers or string?"
read input
if [[ "$input" == "number" ]] || [[ "$input" == "Number" ]] || [[ "$input" == "NUMBER" ]] ;then
echo "Number selected 1"
elif [[ "$input" == "String" ]] | [[ "$input" == "STRING" ]] || [[ "$input" == "string" ]] ;then
echo "String selected"
echo "Please give me the string to be XOR'ed"
read convertthis
echo $convertthis | xxd -bi > bin-store
$(sed -i -e 's/00000000://g' bin-store)
$(sed -i -e 's/($convertthis).//g' bin-store)
else
echo "Please re-run the script, input is wrong"
fi
我想创建一个新表,该表是此表的子集,但仍保留原始表的分区。
像
这样简单col_1 col_2 col_3 YEAR MONTH DATE
a b c 2017 03 25
但是,由于原始表太大,我必须通过分区遍历此查询。
我目前的解决方案是编写一个shell脚本,遍历所有分区并为每个分区运行单独的查询。
示例:
CREATE new_table AS
SELECT *
FROM original_table
WHERE (conditions);
但这似乎非常圆润且效率低下。有没有办法在hive中直接执行此操作?
答案 0 :(得分:3)
我最近不得不做几天这样的事情,但它要求你复制原始表的架构,或者至少使用CREATE TABLE LIKE
但是,最重要的是,你的insert语句需要指定分区
CREATE TABLE new_table (
fields...
)
PARTITIONED BY (year STRING, month STRING, day STRING);
INSERT OVERWRITE TABLE new_table PARTITION(year, month, day)
SELECT fields... , year, month, day -- partitions must be last
FROM original_table
WHERE
year BETWEEN '2016' AND '2017'; -- add more, as necessary
您也可以使用CTAS功能,但使用分区表执行这些功能并不简单