Question

我想从文件名中删除一些字符串。我想删除括号中的每个字符串，但如果有一个字符串＆＃34; remix＆＃34;或＆＃34;混音＆＃34;或＆＃34; REMIX＆＃34; 现在我有了

sed "s/\s*\(\s?[A-z0-9. ]*\)//g"

但如何在字符串中有混音时排除个案？

Answer 1

您可以使用捕获组：

sed 's/\(\s*([^)]*remix[^)]*)\)\|\s*(\s\?[a-z0-9. ]*)/\1/gi'

当＆＃34; remix分支＆＃34;不匹配，未定义捕获组，匹配的部分被替换为空字符串。

当＆＃34; remix分支＆＃34;成功后，匹配的部分将被捕获组的内容替换，因此单独替换。

注意：如果这有助于避免误报，您可以在＆＃34; remix＆＃34;：\bremix\b

周围添加字边界

模式细节：

\(           # open the capture group 1
    \s*      # zero or more white-spaces
    (        # a literal parenthesis
    [^)]*    # zero or more characters that are not a closing parenthesis
    remix
    [^)]*
    )   
\)           # close the capture group 1
\|           # OR
# something else between parenthesis

\s*  # note that it is essential that the two branches are able to
     # start at the same position. If you remove \s* in the first
     # branch, the second branch will always win when there's a space
     # before the opening parenthesis.
(\s\?[a-z0-9. ]*)

\1是对捕获组1的引用

i使模式不区分大小写

[编辑]

如果您想以符合POSIX的方式进行此操作，则必须使用不同的方法，因为几个Gnu功能不可用，特别是交替\|（还有i修饰符， \s个字符类，可选的量词\?）。

这另一种方法包括找到所有不是左括号的最终字符，以及括号与＃34; remix＆＃34;之间的所有最终子字符串。在里面，然后是最终的空格和括号之间的最终子串。

正如您所看到的，所有都是可选的，模式可以匹配空字符串，但这不是问题。

在第1组中捕获要删除的括号部分之前的所有内容。

sed 's/\(\([^(]*([^)]*[Rr][Ee][Mm][Ii][Xx][^)]*)[^ \t(]*\([ \t]\{1,\}[^ \t(]\{1,\}\)*\)*\)\([ \t]*([^)]*)\)\{0,1\}/\1/g;'

模式细节：

\(     # open the capture group 1
    \(
        [^(]*  # all that is not an opening parenthesis
        # substring enclosed between parenthesis without "remix"
        ( [^)]* [Rr][Ee][Mm][Ii][Xx] [^)]* )

        # Let's reach the next parenthesis without to match the white-spaces
        # before it (otherwise the leading white-spaces are not removed)
        [^ \t(]*  # all that is not a white-space or an opening parenthesis
        # eventual groups of white-spaces followed by characters that are
        # not white-spaces nor opening parenthesis
        \( [ \t]\{1,\} [^ \t(]\{1,\} \)*
    \)*
\)     # close the capture group 1
\(
    [ \t]*  # leading white-spaces
    ([^)]*) # parenthesis
\)\{0,1\}   # makes this part optional (this avoid to remove a "remix" part
            # alone at the end of the string)

此模式中的单词边界也不可用。因此，模仿它们的唯一方法是列出四种可能性：

([Rr][Ee][Mm][Ii][Xx])                # poss1
([Rr][Ee][Mm][Ii][Xx][^a-zA-Z][^)]*)  # poss2
([^)]*[^a-zA-Z][Rr][Ee][Mm][Ii][Xx])  # poss3
([^)]*[^a-zA-Z][Rr][Ee][Mm][Ii][Xx][^a-zA-Z][^)]*) # poss4

并将([^)]*[Rr][Ee][Mm][Ii][Xx][^)]*)替换为：

\(poss1\)\{0,\}\(poss2\)\{0,\}\(poss3\)\{0,\}\(poss4\)\{0,\}

Answer 2

只需跳过匹配＆＃34; remix＆＃34;

的行

sed '/([^)]*[Rr][Ee][Mm][Ii][Xx][^)]*)/! s/([^)]*)//g'

Answer 3

其中括号为（美国）：[]

sed '/remix\|REMIX\|Remix/ !s/\[[^]]*]//g'

where bracet（ROW）：()

sed '/remix\|REMIX\|Remix/ !s/([^)]*)//g'

假设： - 没有内部支架 - 其他形式的混音被排除（ReMix，...），因此删除了该行 - 混音可以是标题中的任何位置（i love remix）[如果需要，可以指定要删除的内容]

没有一个案例的正则表达式模式

3 个答案: