Question

我有兴趣根据数字编号连接多个文件，并删除第一行。

e.g。 chr1_smallfiles然后chr2_smallfiles然后chr3_smallfiles ....等（每个没有标题）

请注意chr10_smallfiles需要chr9_smallfiles之后 - 也就是说，这需要是数字排序顺序。

当将两个命令awk和ls -v1分开时，每个命令都正常工作，但是当它们放在一起时，它不起作用。请帮助谢谢！

awk 'FNR>1' | ls -v1 chr*_smallfiles > bigfile

Answer 1

问题在于您尝试将文件列表传递给awk的方式。此刻，你将awk的输出管道输出到ls，这没有任何意义。

请记住，正如评论中所提到的，ls是一种交互式使用的工具，一般来说，它的输出不应该被解析。

如果排序不是问题，您可以使用：

awk 'FNR > 1' chr*_smallfiles > bigfile

shell会将glob chr*_smallfiles扩展为一个文件列表，这些文件作为参数传递给awk。对于每个文件名参数，将打印除第一行之外的所有文件。

由于您要对文件进行排序，因此事情并非如此简单。如果您确定存在所有文件，请在原始命令中将chr*_smallfiles替换为chr{1..99}_smallfiles。

使用一些特定于Bash和GNU的排序功能，您还可以实现如下排序：

printf '%s\0' chr*_smallfiles | sort -z -n -k1.4 | xargs -0 awk 'FNR > 1' > bigfile

printf '%s\0'打印每个文件名后跟一个空字节
sort -z对以空字节分隔的记录进行排序
-n -k1.4进行数字排序，从第4个字符开始（文件名的数字部分）
xargs -0将排序的，以null分隔的输出作为参数传递给awk

否则，如果你想按数字顺序浏览文件，并且你不确定是否所有文件都存在，那么你可以使用shell循环（尽管它会明显慢于a）单awk调用）：

for file in chr{1..99}_smallfiles; do # 99 is the maximum file number
    [ -f "$file" ] || continue # skip missing files
    awk 'FNR > 1' "$file"
done > bigfile

Answer 2

您还可以使用tail连接所有没有标题

的文件

tail -q -n+2 chr*_smallfiles > bigfile

如果您想按照质询中的描述以自然排序顺序连接文件，则可以使用ls -v1的结果传递给xargs >

ls -v1 chr*_smallfiles | xargs -d $'\n' tail -q -n+2 > bigfile

（感谢Charles Duffy）xargs -d $'\n'将分隔符设置为换行符\n，以防文件名包含空格或引号

Answer 3

使用bash 4关联数组仅提取每个文件名的数字子字符串;单独排序;然后在生成的顺序中检索并连接全名：

#!/usr/bin/env bash

case $BASH_VERSION in ''|[123].*) echo "Requires bash 4.0 or newer" >&2; exit 1;; esac

# when this is done, you'll have something like:
#   files=( [1]=chr_smallfiles1.txt
#           [10]=chr_smallfiles10.txt
#           [9]=chr_smallfiles9.txt )
declare -A files=( )
for f in chr*_smallfiles.txt; do
  files[${f//[![:digit:]]/}]=$f
done

# now, emit those indexes (1, 10, 9) to "sort -n -z" to sort them as numbers
# then read those numbers, look up the filenames associated, and pass to awk.
while read -r -d '' key; do
  awk 'FNR > 1' <"${files[$key]}"
done < <(printf '%s\0' "${!files[@]}" | sort -n -z) >bigfile

Answer 4

你可以使用下面的for循环，这对我有用： -

for file in chr*_smallfiles 
do
    tail +2 "$file" >> bigfile
done

它将如何运作？ for循环使用wild chard character * chr*_smallfiles读取当前目录中的所有文件，并将文件名分配给变量file，tail +2 $file将输出该文件的所有行，除了第一行并附加到文件bigfile中。所以最后所有文件都会合并（接受每个文件的第一行）到一个文件bigfile。

Answer 5

为了完整起见，sed解决方案怎么样？

for file in chr*_smallfiles 
    do
        sed -n '2,$p' $file >> bigfile
    done

希望它有所帮助！

基于awk w / o头中的名称子字符串的数字排序来连接文件

5 个答案: