逐块合并两个文本文件

时间:2016-04-18 11:15:56

标签: bash shell awk sed

我有两个文本文件,每个文件包含一个由空行分隔的文本块。这些街区的大小各不相同。

$sql = "SELECT `content` FROM `content` ORDER BY id DESC LIMIT 1"; // assuming id is your primary key column.

和类似的另一个

# ::id 10
# ::snt Yes !
 ...multiple lines of unstructured data from file 1...

# ::id 11
# ::snt said Lion .
 ...multiple lines of unstructured data from file 1...

# ::id 12
# ::snt Yes yes !
 ...multiple lines of unstructured data from file 1...

# ::id 13
# ::snt said Tiger .
 ...multiple lines of unstructured data from file 1...

我想合并这两个块,但是按# ::id 10 # ::snt No ! ...multiple lines of unstructured data from file 2... # ::id 11 # ::snt said Monkey . ...multiple lines of unstructured data from file 2... # ::id 12 # ::snt No no ! ...multiple lines of unstructured data from file 2... # ::id 13 # ::snt said Donkey . ...multiple lines of unstructured data from file 2... 对它们进行排序。另外,我需要在file2数据块之前保持file1数据块的顺序。所以最终输出应该是这样的:

# ::id

我该怎么办?任何内容都可以# ::id 10 # ::snt Yes ! ...multiple lines of unstructured data from file 1... # ::id 10 # ::snt No ! ...multiple lines of unstructured data from file 2... # ::id 11 # ::snt said Lion . ...multiple lines of unstructured data from file 1... # ::id 11 # ::snt said Monkey . ...multiple lines of unstructured data from file 2... # ::id 12 # ::snt Yes yes ! ...multiple lines of unstructured data from file 1... # ::id 12 # ::snt No no ! ...multiple lines of unstructured data from file 2... # ::id 13 # ::snt said Tiger . ...multiple lines of unstructured data from file 1... # ::id 13 # ::snt said Donkey . ...multiple lines of unstructured data from file 2... bashsed

4 个答案:

答案 0 :(得分:1)

说:awk -f merge.awk file1 file2

BEGIN { RS="" }
{ ARR[NR] = $0 }
END {
    n = asort(ARR);
    for (i = 1; i <= n; i++)
        print ARR[i];
}

答案 1 :(得分:1)

$ awk -v RS= -v ORS='\n\n' 'NR==FNR{a[NR]=$0;next} {print a[FNR] ORS $0}' file1 file2
# ::id 10
# ::snt Yes !

# ::id 10
# ::snt No !

# ::id 11
# ::snt said Lion .

# ::id 11
# ::snt said Monkey .

# ::id 12
# ::snt Yes yes !

# ::id 12
# ::snt No no !

# ::id 13
# ::snt said Tiger .

# ::id 13
# ::snt said Donkey .

上面一次将一个段落的文件内容读入数组a[],其中段落是由空行链分隔的文本块(由RS设置为null)。当它读取第一个文件时,它只是将它们存储在数组a[1..number of paragraphs]中,然后在读取第二个文件后将其全部读入a[],然后从file1打印相应的段落。 (a[paragraph number])首先,然后是file2的当前段落。

答案 2 :(得分:0)

您可以使用sedsort

来实现这一目标
 sed '/# ::id/N;s/\n/ /;/^$/d' file1 file2 | sort -s -n -k3,3 | sed 's/\(# ::snt.*\)/\n\1\n/'

第一个sed部分连接第一行和包含# ::id的行,并删除空行。

然后,结果按表达式# ::id xx(第3个参数)的id号排序。

最后,这条线被切成两块,找到了# ::snt

答案 3 :(得分:0)

如果未在两个文件中对齐

,则会按ID号匹配记录
$ awk -F'\n' -v RS= 'NR==FNR{a[$1]=$0; next}
                            {printf "%s\n\n%s\n\n",a[$1],$0}' file1 file2

# ::id 10
# ::snt Yes !
 ...multiple lines of unstructured data from file 1...

# ::id 10
# ::snt No !
 ...multiple lines of unstructured data from file 2...

# ::id 11
# ::snt said Lion .
 ...multiple lines of unstructured data from file 1...

# ::id 11
# ::snt said Monkey .
 ...multiple lines of unstructured data from file 2...

# ::id 12
# ::snt Yes yes !
 ...multiple lines of unstructured data from file 1...

# ::id 12
# ::snt No no !
 ...multiple lines of unstructured data from file 2...

# ::id 13
# ::snt said Tiger .
 ...multiple lines of unstructured data from file 1...

# ::id 13
# ::snt said Donkey .
 ...multiple lines of unstructured data from file 2...

可以增强以捕获file2中缺少的记录。