我正在努力创建一个可以获取filelist1(tar文件列表)和filelist2(目录列表)的脚本。我需要遍历/读取这些文件列表,并将filelist1 mv中的第一个文件放到filelist2中的第一个dir中。在那里,我将提取并执行此文件夹中的文件的其他活动。尝试自动化,因为我每天将有130多个tar文件,每个文件包含75到200个必须处理的文件。以下是我正在处理的脚本(WIP):
#############################################################################
#############################################################################
#
# Incremental load script v1
# Created 02/09/2015 NHR
#
#############################################################################
#############################################################################
#
# Clean up before running
#
# "/u02/hdfs_staging/ios/incremental/TOPACTR_DeltaFiles"
#
if [ -f filelist1 ] ; then
rm filelist1
fi
if [ -f filelist2 ] ; then
rm filelist2
fi
#
# Create filelist containing name of files parsed for dir's loaded from kdwxxxx
#
for i in *tar
do
echo "$i" | rev | cut -d"." -f2 | rev >> filelist1
done
#
# Create work dir's for extracting tar files into for each date
#
while IFS= read -r file
do
[ ! -d "$file" ] && mkdir "$file"
done < "/u02/hdfs_staging/ios/incremental/TOPACTR_DeltaFiles/filelist1"
#
# Create filelist2 containing name of files parsed to copy
# tar files to dir's for extraction
#
shopt -s nullglob # Bash extension, so that empty glob matches will work
for file in ./*.tar ; do # Use this, NOT "for file in *"
echo "$file" >> filelist2
done
#
# Copy and Decompress tar files in these new dir's
# HERE IS WHERE I NEED TO LOOP THROUGH THE FILELIST1 AND FILELIST2
# AND PERFORM ADDITIONAL COMMANDS
#
#
# Execute hive load to external table script to load incremental files to ios_incremental.
# The ios_incremental database tables for these files is in place.
#
#hive -e CREATE EXTERNAL TABLE $filelist
#
# Run hive SQL script to add changed files to ios_staging tables.
# This will be called from a hql script file and will require variables
# for each table involved. This view combines record sets from both the
# Base (base_table) and Change (incremental_table) tables and is reduced
# only to the most recent records for each unique .id. It is defined as
# follows:
#
#hive -e
# CREATE VIEW reconcile_view AS
# SELECT t1.* FROM
# (SELECT * FROM base_table
# UNION ALL
# SELECT * FROM incremental_table) t1
# JOIN
# (SELECT id, max(modified_date) max_modified FROM
# (SELECT * FROM base_table
# UNION ALL
# SELECT * FROM incremental_table) t2
# GROUP BY id) s
# ON t1.id = s.id AND t1.modified_date = s.max_modified;
#
#
# Copy updated ios_staging data to update ios_prod db
#
#
# Clean and Archive files to get ready for next incremental load
#
答案 0 :(得分:0)
我认为你要找的是同时迭代两个列表。
这是一种方法,它假定文件名的名称中没有换行符或冒号(很容易将冒号更改为其他符号):
paste -d: filelist1 filelist2 | while IFS=: read -r file1 file2; do
some_command "$file1" "$file2"
# ...
done
更防御的解决方案是将列表放入数组而不是文件中,然后使用for循环进行迭代。 (我省略了数组的创建; SO上有很多例子):
for ((i=0;i<${#filearray1[@]};++i)); do
file1="${filearray1[i]}"
file2="${filearray2[i]}"
some_command "$file1" "$file2"
# ...
done
答案 1 :(得分:0)
也许是这样的(明显缺乏错误检查):
exec 3< filelist1 4< filelist2
while read -u3 tarfile
do
read -u4 destination
mv "${tarfile}" "${destination}"/.
( cd "${destination}"
# ... other stuff
) # subshell is to avoid having to cd back where you came from
done
exec 3<&- 4<&-