Question

我有一个包含数千个文件的文件夹，其名称如下： feed_1.txt, feed_2.txt, feed_3.txt

如何仅选择feed_40000.txt及更高版本的文件？

Answer 1

您可以使用find regex开关：

find . -type f -regextype posix-awk -regex ".*/feed_([4-9]|[123][0-9])[0-9]{4,}\.txt"

Answer 2

您可以使用此基于awk的检查来获取值为>= 40000的文件名：

printf "%s\n" feed_[0-9]* | awk -F '[_.]+' '$2 >= 40000'

要循环使用这些文件名：

while read -r file; do
   printf "processing %s\n" "$file"
done < <(printf "%s\n" feed_[0-9]* | awk -F '[_.]+' '$2 >= 40000')

Answer 3

你可以做到

find . -type f -name "feed_*" | awk -F"_" '$2+0>=40000' # => list of file names...

Answer 4

对于正则表达式解决方案：

/feed_([4-9][0-9]{4}|[1-9][0-9]{5,})\.txt/g

这将匹配符合两种格式之一的字符串：

feed_ab.txt，其中a是来自4-9的数字，b是四位数（对于40000 <=数字＆lt; = 99999，或

的情况）

feed_cd.txt，其中c是来自1-9的数字，d是五位或更多位数（对于100000 <=数字的情况）。

Answer 5

好的，这也是我的方法（作为exec和awk的学习和实验）。以下是命令。

find -type f  -exec awk --re-interval 'FILENAME ~ /feed_[4-9][0-9]{4,}.txt/ && !a[FILENAME]++{;print FILENAME} END{if(FILENAME ~ /feed_[4-9][0-9]{4,}.txt/ && !a[FILENAME]++){print FILENAME}}' {} \;

以下是相同点。

I-我首先使用--re-interval来支持{4，}以找到4个连续出现的0-9位数，但是在较新版本的awk中这可以被移除。

II-1我得到了更多的学习

a- When using \; at last of command it will read the empty size files BUT

b- When using \+ it will NOT display the 0 size files BECAUSE

c- We all know \+ collects all the files first then it will perform mentioned action in single shot, so obviously END section will pick only the last file and other files which have ZERO size will NEVER be read.

编辑：现在也添加一种非单行形式的命令。

find -type f  -exec awk --re-interval \
 'FILENAME ~ /feed_[4-9][0-9]{4,}.txt/ && !a[FILENAME]++{;print FILENAME} \
 END{if(FILENAME ~ /feed_[4-9][0-9]{4,}.txt/ && !a[FILENAME]++){print FILENAME}}' {} \;

选择基于名称的文件

5 个答案: