Question

sed "s/\(^[a-z,0-9]*\)\(.*\)\( [a-z,0-9]*$\)/\1\2 \1/g" desired_file_name

即使你只解释其中的一部分，或者至少用s\alphanumerical_at_start\something\alphanumerical_at_end\something_else\global

中的单词结构，我也会对它进行解释。

有人可以解释这意味着什么，为什么以及所有regEx都这么......太可怕了？

我知道它用最后一个替换了第一个低位字母数字字。但你能解释一下这里发生了什么吗？所有/\和$.*$\以及其他所有内容是什么？

我迷失了。

编辑：以下是我得到的结果：(^[a-z0-9]*)以低谷z和0低谷9开头;并且[a-z,0-9]*$是相同的但是最后一个单词（但是[0-9,a-z] =只是前2个字符/第一个字符，或整个单词？）。另外：*或$.*$\甚至意味着什么？

Answer 1

这是一个sed搜索和替换，其形式为s/search/replace/flags，唯一的标志是g，这意味着搜索/替换是全局的，因此如果匹配在一个单独发生多次而不只是第一行。

首先，这是它搜索的正则表达式：

\(^[a-z,0-9]*\)\(.*\)\( [a-z,0-9]*$\)

或者以更易读的格式：

\(             # start capture group 1
  ^              # match at the beginning of the line
  [a-z,0-9]*     # zero or more alphanumeric or comma characters (lowercase only)
\)             # end capture group 1
\(             # start capture group 2
  .*             # zero or more of any character (except for newlines)
\)             # end capture group 2
\(             # start capture group 3
  [ ]            # literal ' ' character (I added brackets for clarity)
  [a-z,0-9]*     # zero or more alphanumeric or comma characters (lowercase only)
  $              # match at the end of the line
\)             # end capture group 3

这是替代品：

\1\2 \1

这将使用捕获组1的内容替换整个行（因为正则表达式中的^和$锚点），然后是捕获组2的内容，然后是空格，然后是捕获组1的内容。

Answer 2

（^ [a-z，0-9]） - 一行开头的字母数字或逗号（第1组）
（。） - 任意角色（第2组）
（[a-z，0-9] * $） - 一个空格，后跟零个或多个字母数字或逗号[猜猜逗号只是一个错误]，到一行的结尾
\ 1 \ 2 \ 1 - 替换为（组1）（组2）空格（组1）
g - 输入中的任何地方

Answer 3

正则表达式是一种描述常规语法的方法。他们以非常简洁和高效的方式实现了这一目标。这使它们看起来很复杂。

它们也是结构化和可解码的。

首先，有一个sed电话。

sed "{operation}/{expression}/{replacement}/{modifiers}" {argument}

注意

sed用正斜杠分隔零件。这意味着您无法在{expression}或{replacement}中使用未转义的正斜杠。
与大多数其他正则表达式dialet不同，sed使用括号匹配实际括号和转义括号来定义捕获组。

{operation}恰好是s - 替代。

{expression}为$^[a-z,0-9]$$.*$$ [a-z,0-9]*$$，其分解为

\(             # start capture group 1
  ^            #   match the start of the string
  [a-z,0-9]    #   match characters a-z and 0-9 and a comma (!)
\)             # end capture group 1
\(             # start capture group 2
  .*           #   match any character (.), zero or more times (*)
\)             # end capture group 2
\(             # start capture group 3
               #   match a space
  [a-z,0-9]*   #   match characters a-z and 0-9 and a comma (!)
  $            #   match the end of the string
\)             # end capture group 3

想一下，编写一个执行相同功能的函数需要多少代码（和时间），以及正则表达式需要多少空间。这就是为什么它更难阅读 - 它被极度压缩。

{replacement}是\1\2 \1。 \n称为反向引用，其中n是捕获组的编号。因此，这将再次插入组1和组2的内容，空格和组1的内容。

{modifiers}部分是g标记，使正则表达式尽可能频繁地应用。在这种特殊情况下，它没有多大意义，因为上面的正则表达式无论如何都只能匹配一次。

Answer 4

s/\(^[a-z,0-9]*\)\(.*\)\( [a-z,0-9]*$\)/\1\2 \1/g

s -> substitute
/ -> begin of regex
\( -> begin of a first field( accessed as \1 later)
^  -> from the begining of line in data
[a-z,0-9] -> list of characters which will be compared, lowercase a through z, comma, and 0 through 9
* -> zero or more times
\) -> end of \1 field
\( -> begin of \2
.* -> . means any character. .* means any character zero or more times
\) -> end of \2
\( [a-z,0-9]*$ -> begin of \3, followed by a space, follwed by zero or more a-z, comma, 0-9
\) -> end of \3 field
/ -> end of regex to replace

/ -> begin of regex to replace with
\1\2 \1 -> first field followed by second field followed by a space and again the first field
/ -> end of regex to replace with

g -> globally

这个正则表达式意味着什么，为什么

4 个答案: