Question

我有一个CSV文件，它使用高度自定义的格式。这里，每个数字代表4列中每一列的数据：

1 2 [3] 4

我需要将sed限制为仅搜索和修改第四列中显示的数据。从本质上讲，它必须忽略在第一次出现结束方括号和空格]之前出现的行上的所有数据，并且只修改之后出现的数据。例如，file1.txt可能包含：

penguin bird [lives in Antarctica] The penguin lives in cold places.
wolf dog [lives in Antarctica with penguins] The wolf likes to eat penguins.

替换可能是sed 's/penguin/animal/g' file1.txt。运行脚本后，输出将如下所示：

penguin bird [lives in Antarctica] The animal lives in cold places.
wolf dog [lives in Antarctica with penguins] The wolf likes to eat animal.

在这种情况下，penguin的所有外观都会在第一个]之前被忽略，并且只会在之后出现的行上更改。

其他结束括号可能会出现在后面的行中，但只有第一个应该被视为分部。

如何sed在找到并替换文字时忽略此自定义CSV格式的前三列？

我有GNU sed版本4.2.1。

Answer 1

你告诉sed搜索']'组合后跟.*（任何东西），然后作为替换的一部分，你放回]个字符。

唯一的问题是sed通常“认为”] char是字符类定义的一部分，所以你必须逃避它。尝试

echo "a b [c] d" | sed 's/\] .*$/\] XYZ/'
a b [c] XYZ

注意，因为没有开放[字符来表示char-class def，所以你可以使用

echo "a b [c] d" | sed 's/] .*$/] XYZ/'
a b [c] XYZ

修改

要修正第4个单词，

echo "a b [c] d e" | sed 's/\] [^ ][^ ]*/\] XYZ/' a b [c] XYZ e

从上面[^ ][^ ]/添加“any-char-that-not-a-space”后跟任意数量的“any-char-that-not-a-space”，所以当匹配器发现下一个空格停止匹配时。

最终修改

echo "penguin bird [lives in Antarctica] The penguin lives in cold places. wold dog [lives in Antarctica with penguins] The wolf likes to eat penguins." \ | sed 's/\] The penguin $.*$$/] The animal \1/'

当你使用gnu sed时，你不需要逃避(...捕获的parens。

echo "penguin bird [lives in Antarctica] The penguin lives in cold places. wold dog [lives in Antarctica with penguins] The wolf likes to eat penguins." \ | sed 's/\] The penguin (*$)/] The animal \1/'

<强>输出

penguin bird [lives in Antarctica] The animal lives in cold places. wolf dog [lives in Antarctica with penguins] The wolf likes to eat penguins.

取决于您使用的sed版本。 sed AIX与solaris之间存在相当大的差异，VS通常在lunix中找到的GNU seds。

如果您对使用sed有其他疑问，通常可以添加sed --version或sed -V的输出。如果没有来自这些命令的响应，请尝试what sed。否则包括uname的操作系统名称。

IHTH

Answer 2

假设您只有一次结束括号，我会使用awk来执行此操作：

awk 'BEGIN {FS=OFS="]"} { gsub(/penguin/, "animal", $2) }1' file.txt

结果：

penguin bird [lives in Antarctica] The animal lives in cold places.
wolf dog [lives in Antarctica with penguins] The wolf likes to eat animals.

Answer 3

通常我会像shelter所描述的那样（如果我只是输入一个快速sed命令行），但它的缺点是，一旦你开始匹配部分输入以保留它（使用{ {1}}等）您必须匹配并替换所有内容，并且不能再使用\1之类的简单替换。如果你愿意添加一些样板围绕替换，你可以在保留缓冲区中隐藏行的开头，然后将其取回：

s/penguin/animal/

sed -e 'h' \ -e 's/.*\] //' \ -e 's/penguin/animal/' \ -e 'x' \ -e 's/\] .*/] /' \ -e 'G' \ -e 's/\n//'将原始行保存在保留空间中。然后我们删除前缀并在行的末尾进行任何替换（在此处选择示例）或一系列替换。然后h交换结束和保存的副本。我们从保存的副本中删除原始结尾，并使用x将它们重新组合在一起。 G添加了我们不想要的换行符，因此我们将其删除。

Answer 4

这可能适合你（GNU sed）;

sed  -i 's/\]/&\n/;h;s/.*\n//;s/penguin/animal/g;H;g;s/\n.*.\n//' file

说明：

s/\]/&\n/使用\n标记分割
h复制该行
s/.*\n//删除您不想更改的部分
s/penguin/animal/g更改您要更改的部分
H;g将其添加回原始行
s/\n.*\n//删除您要更改的原始行的部分

这适用于每一行，如果更改是有条件的，请使用：

sed  -i '/\]/!b;s//&\n/;h;s/.*\n//;s/penguin/animal/g;H;g;s/\n.*.\n//' file

另一种选择（也许更简单的方法）：

sed ':a;s/\(\].*\)penguin/\1animal/;ta' file

如何限制sed只替换第一个结束方括号后出现的数据？

4 个答案: