Question

我有一大堆（600多个）搜索和替换术语，我需要在某些文件上作为sed脚本运行。问题是搜索术语不是正交的...但我认为我可以通过按行长度排序来逃避它（即首先拉出最长的匹配，然后在每个长度内按字母顺序排出。所以给出了一组未完成的：

aaba
aa
ab
abba
bab
aba

我想要的是一个排序集，例如：

abba
aaba
bab
aba
ab
aa

有没有一种方法可以通过预先说明行长并按字段排序来做到这一点？

奖励分数:-) !!! 搜索和替换实际上只是一个替换的情况术语同 _术语_ 我要使用的sed代码是 S /术语/ _term_ /克我如何编写正则表达式以避免替换_对中的术语？

Answer 1

您可以在一行Perl脚本中执行此操作：

perl -e 'print sort { length $b<=>length $a || $b cmp $a } <>' input

Answer 2

你可以把它压缩成一个正则表达式：

$ sed -e 's/\(aaba\|aa\|abba\)/_\1_/g'
testing words aa, aaba, abba.
testing words _aa_, _aaba_, _abba_.

如果我理解你的问题，这将解决你所有的问题：没有“双重替换”，总是匹配最长的单词。

Answer 3

$ awk '{print length($1),$1}' file |sort -rn
4 abba
4 aaba
3 bab
3 aba
2 ab
2 aa

我让你试着自己摆脱第一栏

Answer 4

通过这种脚本管道你的流：

#!/usr/bin/python
import sys

all={}
for line in sys.stdin:
    line=line.rstrip()
    if len(line) in all:
        all[len(line)].append(line)
    else:
        all[len(line)]=[line]

for l in reversed(sorted(all)):
    print "\n".join(reversed(sorted(all[l])))

对于奖励标记问题：再次，在python中执行（除非确实有理由不这样做，但我很想知道它）

Answer 5

这将按行长度排序文件，首先是最长行：

cat file.txt | (while read LINE; do echo -e "${#LINE}\t$LINE"; done) | sort -rn | cut -f 2-

这会将term替换为_term_，但不会将_term_转换为__term__：

sed -r 's/(^|[^_])term([^_]|$)/\1_term_\2/g'
sed -r -e 's/(^|[^_])term/\1_term_/g' -e 's/term([^_]|$)/_term_\1/g'

第一个会很好用，除非它会错过_term和term_，错误地将它们单独留下。如果这很重要，请使用第二个。这是我愚蠢的测试案例：

# echo here is _term_ and then a term you terminator haha _terminator and then _term_inator term_inator | sed -re 's/(^|[^_])term([^_]|$)/\1_term_\2/g'
here is _term_ and then a _term_ you _term_inator haha _terminator and then _term_inator term_inator
# echo here is _term_ and then a term you terminator haha _terminator and then _term_inator term_inator | sed -r -e 's/(^|[^_])term/\1_term_/g' -e 's/term([^_]|$)/_term_\1/g'
here is _term_ and then a _term_ you _term_inator haha __term_inator and then _term_inator _term__inator

Answer 6

首先按长度排序，然后按反向alpha位

排序

for mask in `tr -c "\n" "." < $FILE | sort -ur`
do
    grep "^$mask$" $FILE | sort -r
done

tr用法将$FILE中的每个字符替换为句点 - 与grep中的任何单个字符匹配。

如何按行长度排序，然后按字母顺序反向

6 个答案: