我正在寻找一个改变ls结果的bash别名。我经常处理大量文件,不遵循相同的命名约定。关于他们的唯一常见的事情是这个数字是4填充(抱歉不能确定正确的说法)并且在扩展之前立即。
例如 - filename_v028_0392.bgeo,test_x34.prerun.0012.simdata,filename_v001_0233.exr
我希望序列每个都列为1个元素,所以
filename_v003_0001.geo
filename_v003_0002.geo
filename_v003_0003.geo
filename_v003_0004.geo
filename_v003_0005.geo
filename_v003_0006.geo
filename_v003_0007.geo
filename_v003_0032.geo
filename_v003_0033.geo
filename_v003_0034.geo
filename_v003_0035.geo
filename_v003_0036.geo
testxxtest.0057.exr
testxxtest.0058.exr
testxxtest.0059.exr
testxxtest.0060.exr
testxxtest.0061.exr
testxxtest.0062.exr
testxxtest.0063.exr
将显示为
的某些行[seq]filename_v003_####.geo (1-7)
[seq]filename_v003_####.geo (32-36)
[seq]testxxtest.####.exr (57-63)
虽然仍未列出非序列。
我真的不知道从哪里开始接近这个。我知道有相当数量的python,但不确定这是否真的是最好的方法。任何帮助将不胜感激!
由于
答案 0 :(得分:2)
这是使用awk
执行类似操作的一种方法。代码是不可读的:
#!/bin/bash
ls | awk '
function smprint() {
if ((a[1]!=exA1) || (a[2] != exA2+1)) {
if ((exA1) && (exA1==exexA1)) print "\t.. " exfile;
else printf linesep;
if ($0!=exfile) printf $0;
}
};
BEGIN { d="[0-9]"; rg="(.*)(" d d d d ")(.*)"; };
{
split(gensub(rg, "\\1####\\3\t\\2", "g"), a, "\t");
# produces e.g.: a[1]="file####.ext" a[2]="0001"
smprint();
linesep="\n";
exexA1=exA1; # old old a[1]
exA1=a[1]; # old a[1]
exA2=a[2]; # old a[2]
exfile=$0; # old filename
};
END {
smprint();
}'
比较同一文件夹中ls
和上述脚本的输出:
etuardu@subranu:~/Desktop/pippo$ ls
asd1234_0001.tar.bz2 filename_v003_0006.geo script.sh
asd1234_0002.tar.bz2 filename_v003_0007.geo testxxtest.0057.exr
asd1234_0003.tar.bz2 filename_v003_0032.geo testxxtest.0058.exr
filename_v003_0001.geo filename_v003_0033.geo testxxtest.0059.exr
filename_v003_0002.geo filename_v003_0034.geo testxxtest.0060.exr
filename_v003_0003.geo filename_v003_0035.geo testxxtest.0061.exr
filename_v003_0004.geo filename_v003_0036.geo testxxtest.0062.exr
filename_v003_0005.geo other_file testxxtest.0063.exr
etuardu@subranu:~/Desktop/pippo$ ./script.sh
asd1234_0001.tar.bz2 .. asd1234_0003.tar.bz2
filename_v003_0001.geo .. filename_v003_0007.geo
filename_v003_0032.geo .. filename_v003_0036.geo
other_file
script.sh
testxxtest.0057.exr .. testxxtest.0063.exr
etuardu@subranu:~/Desktop/pippo$
如果您想要坚持示例中提供的语法,可以将此输出传递给sed
。有了一些正则表达式魔法,你有:
etuardu@subranu:~/Desktop/pippo$ ./script.sh | sed -r 's/(.*)([0-9]{4})([^\t]+)\t\.\. .*([0-9]{4}).*$/[seq]\1####\3 (\2-\4)/g'
[seq]asd1234_####.tar.bz2 (0001-0003)
[seq]filename_v003_####.geo (0001-0007)
[seq]filename_v003_####.geo (0032-0036)
other_file
script.sh
[seq]testxxtest.####.exr (0057-0063)
etuardu@subranu:~/Desktop/pippo$
然后,您可以完全放入bash脚本并在~/.bashrc
中定义别名来调用它。
作为旁注,请考虑这是一个纯粹的bash-ish解决方案,应该在大多数* nix系统上运行,但所使用的工具并不适合这项任务。您可以考虑使用python
等语言编写此脚本,以获得其可读性和更高级别的字符串操作和模式匹配功能。
答案 1 :(得分:2)
我得到了一个python 2.7脚本,通过解决折叠多个只更改序列号的行的更普遍的问题来解决您的问题
import re
def do_compress(old_ints, ints):
"""
whether the ints of the current entry is the continuation of the previous
entry
returns a list of the indexes to compress, or [] or False when the current
line is not part of an indexed sequence
"""
return len(old_ints) == len(ints) and \
[i for o, n, i in zip(old_ints, ints, xrange(len(ints))) if n - o == 1]
def basic_format(file_start, file_stop):
return "[seq]{} .. {}".format(file_start, file_stop)
def compress(files, do_compress=do_compress, seq_format=basic_format):
p = None
old_ints = ()
old_indexes = ()
seq_and_files_list = []
# list of file names or dictionaries that represent sequences:
# {start, stop, start_f, stop_f}
for f in files:
ints = ()
indexes = ()
m = p is not None and p.match(f) # False, None, or a valid match
if m:
ints = [int(x) for x in m.groups()]
indexes = do_compress(old_ints, ints)
# state variations
if not indexes: # end of sequence or no current sequence
p = re.compile( \
'(\d+)'.join(re.escape(x) for x in re.split('\d+',f)) + '$')
m = p.match(f)
old_ints = [int(x) for x in m.groups()]
old_indexes = ()
seq_and_files_list.append(f)
elif indexes == old_indexes: # the sequence continues
seq_and_files_list[-1]['stop'] = old_ints = ints
seq_and_files_list[-1]['stop_f'] = f
old_indexes = indexes
elif old_indexes == (): # sequence started on previous filename
start_f = seq_and_files_list.pop()
s = {'start': old_ints, 'stop': ints, \
'start_f': start_f, 'stop_f': f}
seq_and_files_list.append(s)
old_ints = ints
old_indexes = indexes
else: # end of sequence, but still matches previous pattern
old_ints = ints
old_indexes = ()
seq_and_files_list.append(f)
return [ isinstance(f, dict) and seq_format(f['start_f'], f['stop_f']) or f
for f in seq_and_files_list ]
if __name__ == "__main__":
import sys
if len(sys.argv) == 1:
import os
lst = sorted(os.listdir('.'))
elif sys.argv[1] in ("-h", "--help"):
print """USAGE: {} [FILE ...]
compress the listing of the current directory, or the content of the files by
collapsing identical lines, except for a sequence number
"""
sys.exit(0)
else:
import string
lst = [string.rstrip(l, '\r\n') for f in sys.argv[1:] for l in open(f)])
for x in compress(lst):
print x
即,您的数据:
bernard $ ./ls_sequence_compression.py given_data
[seq]filename_v003_0001.geo .. filename_v003_0007.geo
[seq]filename_v003_0032.geo .. filename_v003_0036.geo
[seq]testxxtest.0057.exr .. testxxtest.0063.exr
它基于两个连续行中存在的与非数字文本匹配的整数之间的差异。这允许处理非均匀输入,用于作为序列基础的字段的变化......
以下是输入的示例:
01 - test8.txt
01 - test9.txt
01 - test10.txt
02 - test11.txt
02 - test12.txt
03 - test13.txt
04 - test13.txt
05 - test13.txt
06
07
08
09
10
给出:
[seq]01 - test8.txt .. 01 - test10.txt
[seq]02 - test11.txt .. 02 - test12.txt
[seq]03 - test13.txt .. 05 - test13.txt
[seq]06 .. 10
欢迎任何评论!
哈......我附近忘了:没有参数,这个脚本会输出当前目录的折叠内容。