Question

如何从一个文件中获取包含在另一个文件中的行

示例，我有

// first
foo
bar

// second
foo;1;3;p1
bar;1;3;p2
foobar;1;3;p2

这个文件很大，第一个文件包含~500 000条记录，第二个文件包含~20-15万条

我需要得到这个结果

// attention there is no "p1" or "p2" for example
foo;1;3
bar;1;3

Answer 1

这看起来好像需要join命令，可能需要排序。但是有了数百万条记录，是时候认真考虑真正的DBMS了。

join -t\; -o 0,2.2,2.3 <(sort -t\; -k 1,1 first) <(sort -t\; -k 1,1 second)

（对于bash语法，这需要zsh或<(command);但是，您可能需要对临时文件进行排序或对输入文件进行排序。）

Answer 2

grep -f：

-f FILE, --file=FILE
          Obtain patterns  from  FILE,  one  per  line.   The  empty  file
          contains  zero  patterns, and therefore matches nothing.  (-f is
          specified by POSIX.)

cut -d \; -f1-3：

-d, --delimiter=DELIM
          use DELIM instead of TAB for field delimiter

-f, --fields=LIST
          select only these fields;  also print any line that contains  no
          delimiter character, unless the -s option is specified

把它放在一起：grep -f pattern_file data_file | cut -d\; -f1-3。

如何从一个文件中获取内容，该文件包含在另一个文件中

2 个答案: