Question

如何使用shell单行和常用GNU工具在Cartesian产品中连接两个文件中的行？什么是最简洁，美丽和“linuxy”的方式？

例如，如果我有两个文件：

$ cat file1
a
b
$ cat file2
c
d
e

结果应为

a, c
a, d
a, e
b, c
b, d
b, e

Answer 1

这是执行此操作的shell脚本

while read a; do while read b; do echo "$a, $b"; done < file2; done < file1

虽然那会很慢。我无法想到任何预编译逻辑来实现这一目标。速度的下一步是在awk / perl中执行上述操作。

awk 'NR==FNR { a[$0]; next } { for (i in a) print i",", $0 }' file1 file2

嗯，这个使用预编译逻辑的hacky解决方案怎么样？

paste -d, <(sed -n "$(yes 'p;' | head -n $(wc -l < file2))" file1) \
          <(cat $(yes 'file2' | head -n $(wc -l < file1)))

Answer 2

在shell中执行它的机制方法是：

while read line1
do
    while read line2
    do echo "$line1, $line2"
    done < file2
done < file1

join命令有时可用于这些操作 - 但是，我不清楚它是否可以将笛卡尔积作为退化情况。

双循环的最后一步是：

while read line1
do
    sed "s/^/$line1, /" file2
done < file1

Answer 3

我不会假装这很漂亮，但是......

join -t, -j 9999 -o 2.1,1.1 /tmp/file1 /tmp/file2

（感谢下面的Iwan Aucamp）

- join（GNU coreutils）8.4

Answer 4

不会有逗号分隔，只能使用join：

$ join -j 2 file1 file2
 a c
 a d
 a e
 b c
 b d
 b e

Answer 5

修改：

DVK 的尝试激励我使用eval执行此操作：

script='1{x;d};${H;x;s/\n/\,/g;p;q};H' eval "echo {$(sed -n $script file1)}\,\ {$(sed -n $script file2)}$'\n'"|sed 's/^ //'

或更简单的sed脚本：

script=':a;N;${s/\n/,/g;b};ba'

你可以在没有-n开关的情况下使用。

给出：

a, c a, d a, e b, c b, d b, e

原始答案：

在Bash中，你可以这样做。它不是从文件中读取的，但它是一个巧妙的技巧：

$ echo {a,b}\,\ {c,d,e}$'\n' a, c a, d a, e b, c b, d b, e

更简单：

$ echo {a,b}{c,d,e} ac ad ae bc bd be

Answer 6

通用的递归BASH函数可能是这样的：

foreachline() {

    _foreachline() {

        if [ $#  -lt 2 ]; then
            printf "$1\n"
            return
        fi

        local prefix=$1
        local file=$2
        shift 2

        while read line; do
            _foreachline "$prefix$line, " $*
        done <$file
    }

    _foreachline "" $*
}

foreachline file1 file2 file3

问候。

Answer 7

编辑：哎呀...抱歉，我以为这是标记为python ......

如果你有python 2.6：

from itertools import product
print('\n'.join((', '.join(elt) for elt in (product(*((line.strip() for line in fh) for fh in (open('file1','r'), open('file2','r'))))))))

a, c
a, d
a, e
b, c
b, d
b, e

如果你有python pre-2.6：

def product(*args, **kwds):
    '''
    Source: http://docs.python.org/library/itertools.html#itertools.product
    '''
    # product('ABCD', 'xy') --> Ax Ay Bx By Cx Cy Dx Dy
    # product(range(2), repeat=3) --> 000 001 010 011 100 101 110 111
    pools = map(tuple, args) * kwds.get('repeat', 1)
    result = [[]]
    for pool in pools:
        result = [x+[y] for x in result for y in pool]
    for prod in result:
        yield tuple(prod)
print('\n'.join((', '.join(elt) for elt in (product(*((line.strip() for line in fh) for fh in (open('file1','r'), open('file2','r'))))))))

Answer 8

解决方案1：

perl -e '{use File::Slurp; @f1 = read_file("file1"); @f2 = read_file("file2"); map { chomp; $v1 = $_; map { print "$v1,$_"; } @f2 } @f1;}'

Answer 9

awk 'FNR==NR{ a[++d]=$1; next}
{
  for ( i=1;i<=d;i++){
    print $1","a[i]
  }
}' file2 file1

# ./shell.sh
a,c
a,d
a,e
b,c
b,d
b,e

Answer 10

好吧，这是丹尼斯威廉姆森上面的解决方案的推导，因为他注意到他没有从文件中读取：

$ echo {`cat a | tr "\012" ","`}\,\ {`cat b | tr "\012" ","`}$'\n'
a, c
 a, d
 a, e
 b, c
 b, d
 b, e

Answer 11

使用join，awk和流程替换的解决方案：

join <(xargs -I_ echo 1 _ < setA) <(xargs -I_ echo 1 _ < setB)
  | awk '{ printf("%s, %s\n", $2, $3) }'

Answer 12

GNU并行：

<html>
    <head>
        <script src="index.js" type="module"></script>
    </head>
</html>

输出：

parallel echo "{1}, {2}" :::: file1 :::: file2

Answer 13

当然perl有一个用于该目的的模块：

#!/usr/bin/perl

use File::Slurp;
use Math::Cartesian::Product;

use v5.10;
$, = ", ";

@file1 = read_file("file1", chomp => 1);
@file2 = read_file("file2", chomp => 1);

cartesian { say @_ } \@file1, \@file2;

输出：

a, c
a, d
a, e
b, c
b, d
b, e

Answer 14

在 fish 中是单行

printf '%s\n' (cat file1)", "(cat file2)

GNU / Linux中两个文件（作为行集）的笛卡尔积

14 个答案: