Question

不确定这两个表的连接的技术术语是什么（在Google上花了很多年）。在SQL中你要写它从table1，table2;

中选择*

我有表1

Var1
1
2
3

＆安培;表2

Var2
6
7
8

我想合并/加入所以我有

（即，每个var2重复每个var1）除了编写循环之外，还有一种简单的方法可以像在SQL中那样执行此操作吗？

Answer 1

不，你最希望的是：

$ awk 'NR==FNR{a[NR]=$0;next} FNR==1{print $0, a[1]; next} {for (i=2;i in a;i++) print $0, a[i]}' file2 file1
Var1 Var2
1 6
1 7
1 8
2 6
2 7
2 8
3 6
3 7
3 8

输入文件是：

$ cat file1
Var1
1
2
3
$ cat file2
Var2
6
7
8

Answer 2

使用awk的解决方案就像

awk 'FNR==NR{line[$0]++; next} {for (i in line) print $1, i} ' file2 file1

<强>测试

$ cat file1
1
2
3

$ cat file2
6
7
8
$ awk 'FNR==NR{line[$0]++; next} {for (i in line) print $1, i} ' file2 file1
1 6
1 7
1 8
2 6
2 7
2 8
3 6
3 7
3 8

Answer 3

如果第一个文件足够小以适合内存，则可以执行

awk -v OFS='\t' '# Pesky header line
    FNR==1 { if (NR==1) h=$0; else print h, $0; next }
    NR==FNR { a[++i] = $0; next }
    { for (j=1; j<=i; ++j) print a[j], $0 }' table1 table2

对于非常大或非常小的文件，可以尝试

sed 1d table1 |
while read -r line; do
    sed "1d;s/^/$line\t/" table2
done >outputfile

（这只会丢弃第一行，因为我懒惰。没有标题行会简化事情。）

Answer 4

这是使用GNU parallel和Bash完成它的一种有趣方式。我假设没有标题，因为这简化了问题（请参阅下面的一种处理标题的方法）。

没有标题

首先我们生成测试输入：

cat << End-of-table1 > file1
1
2
3
End-of-table1

cat << End-of-table2 > file2
6
7
8
End-of-table2

现在以文件作为输入参数运行parallel：

parallel echo {1} {2} :::: file1 :::: file2

这导致：

带标题

要处理标题和制表符分隔列，下面的解决方案会根据给定的测试数据生成正确的结果。

首先使用标题生成测试数据：

cat << End-of-table1 > file1
Var1
1
2
3
End-of-table1

cat << End-of-table2 > file2
Var2
6
7
8
End-of-table2

使用head和paste提取标题，然后再次使用parallel组合不带标题的列：

paste <(head -n1 file1) <(head -n1 file2); parallel printf '"%s\t%s\n"' {1} {2} \
                                             :::: <(tail -n+2 file1)            \
                                             :::: <(tail -n+2 file2)

输出：

Var1    Var2
1   6
1   7
1   8
2   6
2   7
2   8
3   6
3   7
3   8

GNU并行手册的相关部分

parallel [options] [command [arguments]] ( ::: arguments | :::: argfile(s) ) ...

...

::: arguments
         Use arguments from the command line as input source instead of stdin
         (standard input). Unlike other options for GNU parallel ::: is
         placed after the command and before the arguments.

...

         If multiple ::: are given, each group will be treated as an input
         source, and all combinations of input sources will be generated.
         E.g. ::: 1 2 ::: a b c will result in the combinations (1,a) (1,b)
         (1,c) (2,a) (2,b) (2,c). This is useful for replacing nested for-
         loops.

...

:::: argfiles
         Another way to write -a argfile1 -a argfile2 ...

         ::: and :::: can be mixed.

             See -a, ::: and --xapply.

合并文件，以便对表2中的每一行重复表1中的行

4 个答案:

没有标题

带标题

GNU并行手册的相关部分