Question

我在linux中有一个文件：

该文件包含数字范围，文件类似于：

100,500
501,1000
1001,2000

我还有一个单词和数字的其他文件：

a,105
b,110
c,550
d,670
e,900
f,80
h,1500

然后我需要根据第一个文件中的范围过滤文件并生成文件。然后我需要3个文件：

<<110,500>>
a,105
b,110

<<501,1000>>
c,550
d,670
e,900

<<1001,2000>>
h,1500

使用bash脚本

我可以读取第一个文件，如：

while read line
do
   init=`echo $line | awk 'BEGIN {FS=","}{print $1}'`
   end=`echo $line | awk 'BEGIN {FS=","}{print $2}'`    
done <rangos.txt

我有范围，但我不知道如何根据第一个文件的范围划分第二个文件。

谁能帮帮我？

由于

Answer 1

这是bash中的示例解析器：

#!/bin/bash

declare file1=file1
declare file2=file2
while read line; do 
  if [ -z "${line}" ]; then continue; fi # empty lines 
  declare -i left=${line%%,*}
  declare -i right=${line##*,}

  echo "<<$left,$right>>"

  OIFS=$IFS
  IFS=' '
  for word in $(<$file2); do
    declare letter=${word%%,*}
    declare -i value=${word##*,}

    if [[ $left -le $value && $value -le $right ]]; then
      echo "$letter,$value"
    fi
  done
  IFS=$OIFS
done < "${file1}"

使用bash4在Debian Wheezy下测试，打印：

$ ./parser.sh 
<<100,500>>
a,105
b,110
<<501,1000>>
c,550
d,670
e,900
<<1001,2000>>
h,1500

但是，根据您对perl或其他语言的评论，您应该使用您或您的团队更熟悉的语言进行操作。

Answer 2

我假设这两个文件没有排序，第二个文件每行有一个单词和一个数字。

在这种情况下，您可以执行以下操作：

> out_file.txt
while read line; do
   init=${line#*,}
   end=${line%,*}  
   echo "<<$init,$end>>" >> out_file.txt
   while read wnum; do
       theNum=${wnum#*,}
       if [ $theNum -le $end ] && [ $theNum -ge $init ]; then
         echo "$wnum" >> out_file.txt
       fi
   done < word_and_num.txt
done <rangos.txt

Answer 3

使用awk：

会更容易

BEGIN { FS = "," }
NR==FNR {
    map[$0];     # load data in hash
    next
}
{
    ++count;
    file = "file" count ".txt";   # create your filename
    print "<<" $0 ">>" > file;    # add the header to filename
    for (data in map) {
        split (data, fld, /,/);
        if ( $1 <= fld[2] && fld[2] <= $2 ) {  # add entries if in range
            print (data) > file
        }
    }
    close(file)     # close your file
}

将上述脚本保存在script.awk中。像以下一样运行：

awk -f script.awk datafile rangefile

这将创建三个文件：

$ head file*

==> file1.txt <==
<<100,500>>
a,105
b,110

==> file2.txt <==
<<501,1000>>
c,550
d,670
e,900

==> file3.txt <==
<<1001,2000>>
h,1500

使用bash按照其他文件中的范围划分文件

3 个答案: