Question

让我再解释一下：

我有一个名为tags的目录，每个标签都有一个文件，如：

tags/
    t1
    t2
    t3

在每个标记文件中都有如下结构：

<inode> <filename> <filepath>

当然，每个标记文件都有一个包含该标记的许多文件的列表（但文件只能出现在一个标记文件中一次）。并且文件可能位于多个标记文件中。

我想要做的是调用像

这样的命令

tags <t1> <t2>

让它以一种很好的方式列出标记为t1和t2的文件。

我现在的计划是制作临时文件。基本上将t1的整个文件输出到其中。然后在t2中遍历每一行并对文件执行awk。并继续这样做。

但我想知道是否有人有其他办法。我对awk，grep等并不太熟悉。

Answer 1

你能用吗

sort t1 t2 | uniq -d

这将合并两个文件，对它们进行排序，然后只显示多次出现的行：即两个文件中出现的行。

这假定每个文件中不包含重复项，并且inode在特定文件的所有结构中都是相同的。

Answer 2

您可以尝试使用comm实用程序

comm -12 <t1> <t2>

具有以下选项的适当组合的

comm对于文件内容的不同集合操作可能是有用的。

   -1     suppress column 1 (lines unique to FILE1)

   -2     suppress column 2 (lines unique to FILE2)

   -3     suppress column 3 (lines that appear in both files)

这假定<t1>和<t2>已排序。如果没有，则应首先使用sort

对其进行排序

Answer 3

这是一个单命令解决方案，适用于任意数量的未排序文件。对于大文件，它比使用[ABETAuthorizeAttribute(ConstantHelpers.ROL_ADMINISTRADOR)] public JsonResult ConsultarAlumnosJSON(String codigoFiltro) { var listaAlumnos = context.Alumno.Where(x => x.codigo.Contains(codigoFiltro)).Select(x=>x.codigo).ToList(); return Json(listaAlumnos, JsonRequestBehavior.AllowGet); }和管道要快得多，如下所示。通过将sort更改为$0等，您还可以找到特定列的交集。但是，它假定行在文件中不重复，并且还假设$1的版本具有awk变量。

解决方案：

FNR

说明：

awk ' { a[$0]++ } 
      FNR == 1 { b++ }
      END { for (i in a) { if (a[i] == b) { print i } } } ' \
    t1 t2 t3

基准化：

注意：随着文件中的行越来越长，运行时的改进似乎变得越来越重要。

{ a[$0]++ }                   # on every line in every file, take the whole line ( $0 ), 
                              # use it as a key in the array a, and increase the value 
                              # of a[$0] by 1.
                              # this counts the number of observations of line $0 across 
                              # all input files.

FNR == 1 { b++ }              # when awk reads the first line of a new file, FNR resets 
                              # to 1. every time FNR == 1, we increment a counter 
                              # variable b. 
                              # this counts the number of input files.

END { ... }                   # after reading the last line of the last file...

for (i in a) { ... }          # ... loop over the keys of array a ...

if (a[i] == b) { ... }        # ... and if the value at that key is equal to the number 
                              # of input files...

print i                       # ... we print the key - i.e. the line.

击。从多个文件中获取交集

3 个答案: