Question

我有一堆文件，我从一个wiki（基于Markdown）转移到另一个（基于Creole）。我已经编写了几个sed脚本来转换链接格式和标题格式。但是新的wiki允许一个目录结构，我宁愿使用它而不是我现在拥有的伪目录结构。我已经重命名了这些文件，但我需要将所有链接从_分隔到/分隔。

基本信息：

Creole link: [[url]] [[url|name]]

我只想转换不包含.或/的链接。

如果你解释了你给出的命令意味着我可以从中学习，我将非常感激。

样品

this is a line with a [[Link_to_something]] and [[Something_else|something else]]
this site is cool [[http://example.com/this_page]]

到

this is a line with a [[Link/to/something]] and [[Something/else|something else]]
this site is cool [[http://example.com/this_page]]

我尝试了什么

y///仅适用于整条线路。

s//\u\2仅支持案例翻译。

Answer 1

我想我会使用Perl。它可以作为一个单行，因此：

perl -pe 's{\[\[([^/.|]+)(|[^]]+)?\]\]}{$x=$1;$y=$2;$x=~s%_%/%g;"[[$x$y]]"}gex;' <<'EOF'
this is a line with a [[Link_to_something]] and [[Something_else|something else]]
this site is cool [[http://example.com/this_page]]
EOF

该输出是：

this is a line with a [[Link/to/something]] and [[Something/else|something else]]
this site is cool [[http://example.com/this_page]]

这种好风格等是否完全可以辩论。

我将解释这个版本的代码，它与上面的代码是同构的：

perl -e 'use strict; use warnings;
         while (my $line = <>)
         {
             $line =~ s{ \[\[ ([^/.|]+) (|[^]]+)? \]\] }
                       { my($x, $y) = ($1, $2); $x =~ s%_%/%g; "[[$x$y]]" }gex;
             print $line;
         } '

while循环基本上是-p在第一个版本中提供的循环。我已将输入变量明确命名为$line，而不是像第一个版本那样使用隐式$_。由于$x，我还必须声明$y和use strict; use warnings;。

substitute命令采用s{pattern}{replace}形式，因为正则表达式本身有斜杠。 x修饰符允许两个部分中的（非重要）空格，这使得布局更容易。 g修饰符会在模式匹配时重复替换。 e修饰符表示'将替换的右手部分视为表达式'。

匹配模式查找一对方括号，然后记住除/，.或|以外的一系列字符，可选地后跟|以及]以外的一系列字符，在一对紧密的方括号处结束。这两个捕获是$1和$2。

替换表达式将$1和$2的值保存在变量$x和$y中。然后，它将更简单的替换应用于$x，将下划线更改为斜杠。然后结果值是[[$x$y]]的字符串。您无法直接在替换表达式中修改$1或$2。内部s%_%/%g; clobbers $1和$2，这就是我需要$x和$y的原因。

可能还有另一种方法 - 这是Perl，所以TMTOWTDI：有多种方法可以做到这一点。但这至少有效。

Answer 2

这可能对您有用：

awk -vORS='' -vRS='[[][[][^].]*[]][]]' '{gsub(/_/,"/",RT);print $0 RT}' file
this is a line with a [[Link/to/something]] and [[Something/else|something else]]
this site is cool [[http://example.com/this_page]]

将输出记录分隔符设置为空
将记录分隔符设置为[[...]]（...不包含.。
使用_的

RT

/

打印连接记录并记录分隔符。即$0 RT

这是一个sed解决方案：

sed 's/\[\[[^].]*]]/\a\n&\a\n/g' file |
sed '/^\[\[[^]]*\]\]\a/y/_/\//;H;$!d;g;s/\a\n//g;s/.//'
this is a line with a [[Link/to/something]] and [[Something/else|something else]]
this site is cool [[http://example.com/this_page]]

[[...]]的N.B周围\a\n。选择\a作为不太可能出现在文件中的字符。
将_翻译成以/

[[

删除所有\a\n的

如果你有GNU sed，这样做：

sed '/\[\[[^].]*]]/{s||'\''$(sed "y/_/\\//" <<<"&")'\''|g;s/.*/echo '\''&'\''/}' file 
this is a line with a [[Link/to/something]] and [[Something/else|something else]]
this site is cool [[http://example.com/this_page]]

Answer 3

您可以使用 python 来简化正则表达式：

$ python3 -c '
> import re
> import sys
> for line in sys.stdin:
>     print(re.sub(r"\[\[(?!http).*?\]\]", lambda m:m.group(0).replace("_", "/"), line), end="")
> ' <input.txt

this is a line with a [[Link/to/something]] and [[Something/else|something else]]
this site is cool [[http://example.com/this_page]]

注意：行开头的$ and >是命令提示符。

您也可以在 vim 直观地执行此操作：

/\[\[\(http\)\@!.\{-}\]\]
:%s@@\=substitute(submatch(0), '_', '/', '')@g

翻译部分行

样品

我尝试了什么

3 个答案: