Question

我有几个文件中有数字的文件（~70000），有几个例子是991000_Metatissue.qsub.file 828000_Metatissue.qsub.file，然后我有另一个文件（files_failed.txt）一堆数字，我会用来grep。此列表如下所示：

我尝试过：ls -1 *.qsub.file | grep -F -f files_failed.txt -甚至这样做：

  ls -1 *.qsub.file > files_to_submit.txt
  grep -F -f files_failed.txt files_to_submit.txt

但总是得到所有qsub.files ...

Answer 1

grep -f编写得不好（请参阅GNU bug 16305），所以我建议改用awk：

find . -name '*_*.qsub.file' |awk -F_ '
  NR == FNR { failed[$NR] = 1; next }
  $1 in failed
' files_failed.txt /dev/stdin

这使用find找到有问题的文件，将它们输入到awk中。在awk处理之前，当行号（files_failed.txt，到目前为止的记录数）等于当前的行号时，它会读取NR并将值存储到关联的数组（也就是字典或散列）中file（FNR），这意味着它是第一个读取的文件。如果第一列（自_分隔后的文件编号）在该数组中，则表示失败。 AWK对节的默认操作是打印它，因此您将获得这些失败文件的列表。

注意缺少正则表达式！在一个大目录上，比grep -F -f … 快得多，它本身比grep -f …快得多，即使假设上述错误已修复。

Answer 2

对于ls来说，70000个文件太多了，你应该使用find。

我更喜欢反转逻辑，列出所需的而不是列出所有然后过滤。

像

这样的东西

const array = ["one-", "two-", "three-", "testing-"];
let DataStr = array.toString();
let OutputStr = DataStr.replace("-", "");
var res = OutputStr.split(",");
console.log(res);

如果您需要在另一个文件中退出？

while read line; do find -iname $line_Metatissue.qsub.file; done < files_failed.txt

Answer 3

您可以使用以下脚本： -

 ls -1 *.qsub.file > filelist.txt
 while read pattern
 do
     filefound=$(grep $pattern  filelist.txt)
     if [ "$filefound" != "" ]; then
         echo "File Found : $filefound"
     fi 
 done < files_failed.txt

第二个选项： -

  while read pattern
  do
      find . -name "$pattern*.qsub.file" >> filefound.txt       
  done < files_failed.txt

您的所有文件都将存储在文件filefound.txt

中

Answer 4

您应该使用find，并且需要修改＆＃34;模式＆＃34;。这是一种应该有效的方法：

# List all files ending in "qsub.file"
find . -name '*.qsub.file' |

# Add ./ and _ to each number to make the match exact
grep -F -f <(sed -e 's:^:./:' -e 's/$/_/' files_failed.txt)

grep文件针对仅包含数字的列表

4 个答案: