Question

我有一个名为file.out的列表，其中包含如下所示的文件：

file                                                            a    b    c    d   e
DS_swe/msg.rti-20160510_5_1.0_rnt.txt-20190415_8_2.0_rnt.txt  0.5  1.0  1.5  1.3 2.0
DS_swe/msg.rti-20105510_5_1.0_rnt.txt-20200415_8_2.0_rnt.txt  0.6  2.0  2.5  1.2 4.0
DS_swe/msg.rti-20190510_5_1.0_rnt.txt-20250415_8_2.0_rnt.txt  0.2  8.0  3.5  1.1 6.0
DS_swe/msg.rti-20102510_5_1.0_rnt.txt-20240415_8_2.0_rnt.txt  0.1  2.5  1.2  1.0 8.0
DS_swe/msg.rti-20145510_5_1.0_rnt.txt-20140415_8_2.0_rnt.txt  0.8  2.2  1.4  1.9 5.0

我还有一个名为data的目录，其中包含类似文件

data/
├── 20160510_5_1.0_rnt.txt
├── 20105510_5_1.0_rnt.txt
├── 20190510_5_1.0_rnt.txt
├── 20102510_5_1.0_rnt.txt
└── 20145510_5_1.0_rnt.txt

这些文件名与上面列出的数据部分的一部分匹配，例如：

DS_swe/msg.rti-????????_?_?_???.???-20190415_8_2.0_rnt.txt 0.5  1.0  1.5  1.3 2.0.

加上目录中的所有.txt文件都包含4行，如下所示。例如20160510_5_1.0_rnt.txt包含：

20.0  23.0  25.0  45.0  78.0  sy
14.0  12.0  24.0  45.0  78.0  tx
14.0  25.0  25.0  47.0  78.0  mx
12.0  25.0  32.0  47.0  56.0  cx

所以我要做的是：如果目录中的files（.txt）与上面列表::中标记为?的字符串匹配：:，那么我想从匹配的.txt中提取第3和第4列目录中存在文件，并且还希望提取列表（file.out）中相应文件的第5和第6列值，并希望在相应的.txt内附加相同第5和第6列值的重复值文件，最后想将相同的.txt文件保存在名为results的不同目录中

例如：文件20160510_5_1.0_rnt.txt的预期输出如下

25.0  45.0  1.3  2.0
24.0  45.0  1.3  2.0
25.0  47.0  1.3  2.0
32.0  47.0  1.3  2.0

为解决上述问题，我尝试了以下代码，但停留在需要专家帮助的主要部分。

#!/bin/sh
for file in /home/lijun/data/*.txt
    grep "*.txt" file.out > file
    cat file | if

Answer 1

已更新，其中包括基于OP示例for循环的输入/输出文件目录

一个（有点）冗长的解决方案...

使用awk从file.out中提取文件名和字段5和6：

$ awk '{ split($1,fn,"-"); print fn[2],$5,$6 }' file.out
20160510_5_1.0_rnt.txt 1.3 2.0
20105510_5_1.0_rnt.txt 1.2 4.0
20190510_5_1.0_rnt.txt 1.1 6.0
20102510_5_1.0_rnt.txt 1.0 8.0
20145510_5_1.0_rnt.txt 1.9 5.0

位置：

使用默认的空白输入字段分隔符
split($1,fn,"-")-使用fn作为字段分隔符，将第一个字段分成数组"-"
print fn[2],$5,$6-输出文件名和字段5和6

我们现在将使用第二个awk解决方案遍历此列表，以从文件中提取字段3和4并追加字段5和6（来自file.out）：

# OP will need to update the following variables to ensure they reference the correct directory where the input/output files are located:

$ in_dir="/home/lijun/data"
$ out_dir="/home/lijun/results"

$ while read -r fname field5 field6
do
    # I only have one file in my system so I'll print a warning about files I can't find
  
    [ ! -f "${in_dir}/${fname}" ]                                         && \
    echo "WARNING: Unable to locate file '${in_dir}/${fname}'. Skipping." && \
    continue

    echo "Processing file '${in_dir}/${fname}' ..."

    # pass fields 5 & 6 into `awk` using `-v`; print out desired fields

    awk -v f5="${field5}" -v f6="${field6}" '{ print $3,$4,f5,f6 }' "${in_dir}/${fname}" > "${out_dir}/${fname}"

done < <(awk '{ split($1,fn,"-"); print fn[2],$5,$6 }' file.out)

在我的系统上运行以上代码会生成：

Processing file '20160510_5_1.0_rnt.txt' ...
WARNING: Unable to locate file '20105510_5_1.0_rnt.txt'. Skipping.
WARNING: Unable to locate file '20190510_5_1.0_rnt.txt'. Skipping.
WARNING: Unable to locate file '20102510_5_1.0_rnt.txt'. Skipping.
WARNING: Unable to locate file '20145510_5_1.0_rnt.txt'. Skipping.

$ cat 20160510_5_1.0_rnt.txt.2
25.0 45.0 1.3 2.0
24.0 45.0 1.3 2.0
25.0 47.0 1.3 2.0
32.0 47.0 1.3 2.0

Answer 2

您可以使用第一个awk解析file.out以获取第一列与以下 regex 模式匹配的所有行：

/DS_swe\/msg.rti-(.+)-[0-9]{8}_[0-9]_[0-9].[0-9]_rnt.txt/

在这里，(.+)捕获文件名并将其存储到\1中。

因此要运行的awk行将是：

awk '{
       # Replace the first column with only the related filename in datas
       # and store it in f.
       f=gensub(/DS_swe\/msg.rti-(.+)-[0-9]{8}_[0-9]_[0-9].[0-9]_rnt.txt/,
                "\\1", "1", $1)
       # If the value doesn't match the pattern, f will contain the column value
       # So don't print anything.
       if  (f != $1) print f" "$5" "$6
     } < file.out'

您将获得以下内容：

20160510_5_1.0_rnt.txt 1.3 2.0

然后使用read获取每个列的值：

read f c5 c6 # stores the filename in $f, the 5th column in $c5, the 6th in $c6

至少，使用这些值运行另一个awk：

# Parse data/"$f" file and for each line
# print the 3rd and 4th columns with "$c5 $c6" text
awk '{ print $3" "$4" '"$c5 $c6"'" }' <data/$f

然后您可以通过第二个awk调用来处理输出：

最终工作示例（&&代表逻辑AND；如果读取没有遇到文件结尾，则运行以下命令）：

awk '{
       f=gensub(/DS_swe\/msg.rti-(.+)-[0-9]{8}_[0-9]_[0-9].[0-9]_rnt.txt/, "\\1",
                "1", $1)
       if  (f != $1) print f" "$5" "$6
      }' < file.out | {
                        read f c5 c6 &&
                        awk '{ print $3" "$4" '"$c5 $c6"'" }' <data/$f ;
                      }

结果：

25.0 45.0 1.3 2.0
24.0 45.0 1.3 2.0
25.0 47.0 1.3 2.0
32.0 47.0 1.3 2.0

从列表中提取字符串匹配文件

2 个答案: