迭代两个外部文件列表并在bash脚本中执行命令

时间:2015-02-10 17:24:06

标签: linux bash

我正在努力创建一个可以获取filelist1(tar文件列表)和filelist2(目录列表)的脚本。我需要遍历/读取这些文件列表,并将filelist1 mv中的第一个文件放到filelist2中的第一个dir中。在那里,我将提取并执行此文件夹中的文件的其他活动。尝试自动化,因为我每天将有130多个tar文件,每个文件包含75到200个必须处理的文件。以下是我正在处理的脚本(WIP):

 #############################################################################
 #############################################################################
 #
 #  Incremental load script v1
 #  Created 02/09/2015 NHR
 #
 #############################################################################
 #############################################################################

 #
 # Clean up before running
 #
 # "/u02/hdfs_staging/ios/incremental/TOPACTR_DeltaFiles"
 #


 if [ -f filelist1 ]  ; then
    rm filelist1
 fi

 if [ -f filelist2 ] ; then
    rm filelist2
 fi

 #
 # Create filelist containing name of files parsed for dir's loaded from kdwxxxx
 #
 for i in *tar
     do
         echo "$i" | rev | cut -d"." -f2 | rev >> filelist1
     done

 #
 # Create work dir's for extracting tar files into for each date
 #
 while IFS= read -r file
     do
         [ ! -d "$file"  ] && mkdir "$file"
     done < "/u02/hdfs_staging/ios/incremental/TOPACTR_DeltaFiles/filelist1"

 #
 # Create filelist2 containing name of files parsed to copy
 # tar files to dir's for extraction
 #
 shopt -s nullglob                 # Bash extension, so that empty glob matches will work
   for file in ./*.tar ; do        # Use this, NOT "for file in *"
      echo  "$file" >> filelist2
   done

 #
 # Copy and Decompress tar files in these new dir's
 # HERE IS WHERE I NEED TO LOOP THROUGH THE FILELIST1 AND FILELIST2
 # AND PERFORM ADDITIONAL COMMANDS
 #



 #
 # Execute hive load to external table script to load incremental files to ios_incremental.
 # The ios_incremental database tables for these files is in place.
 #


 #hive -e CREATE EXTERNAL TABLE $filelist


 #
 # Run hive SQL script to add changed files to ios_staging tables.
 # This will be called from a hql script file and will require variables
 # for each table involved. This view combines record sets from both the
 # Base (base_table) and Change (incremental_table) tables and is reduced
 # only to the most recent records for each unique .id.  It is defined as
 # follows:
 #

 #hive -e
 # CREATE VIEW reconcile_view AS
 #    SELECT t1.* FROM
 #    (SELECT * FROM base_table
 #          UNION ALL
 #          SELECT * FROM incremental_table) t1
 #    JOIN
 #       (SELECT id, max(modified_date) max_modified FROM
 #           (SELECT * FROM base_table
 #           UNION ALL
 #           SELECT * FROM incremental_table) t2
 #       GROUP BY id) s
 #    ON t1.id = s.id AND t1.modified_date = s.max_modified;
 #


 #
 # Copy updated ios_staging data to update ios_prod db
 #



 #
 # Clean and Archive files to get ready for next incremental load
 #

2 个答案:

答案 0 :(得分:0)

我认为你要找的是同时迭代两个列表。

这是一种方法,它假定文件名的名称中没有换行符或冒号(很容易将冒号更改为其他符号):

paste -d: filelist1 filelist2 | while IFS=: read -r file1 file2; do
  some_command "$file1" "$file2"
  # ...
done

更防御的解决方案是将列表放入数组而不是文件中,然后使用for循环进行迭代。 (我省略了数组的创建; SO上有很多例子):

for ((i=0;i<${#filearray1[@]};++i)); do
  file1="${filearray1[i]}"
  file2="${filearray2[i]}"
  some_command "$file1" "$file2"
  # ...
done

答案 1 :(得分:0)

也许是这样的(明显缺乏错误检查):

exec 3< filelist1 4< filelist2

while read -u3 tarfile
do
  read -u4 destination
  mv "${tarfile}" "${destination}"/.
  ( cd "${destination}"
    # ... other stuff
  ) # subshell is to avoid having to cd back where you came from
done

exec 3<&- 4<&-