Question

我有这种形式的文本数据：

^Well/Well[ADV]+ADV ^John/John[N]+N ^has/have[V]+V+3sg+PRES ^a/a[ART]
^quite/quite[ADV]+ADV ^different/different[ADJ]+ADJ ^not/not[PART]
^necessarily/necessarily[ADV]+ADV ^more/more[ADV]+ADV
^elaborated/elaborate[V]+V+PPART ^theology/theology[N]+N *edu$

我希望它能够被处理成这种形式：

Well John have a quite different not necessarily more elaborate theology

基本上，我需要起始字符/和结束字符[之间的每个字符串。

这是我尝试过的，但我只是得到空文件......

#!/bin/bash

for file in probe/*.txt

do sed '///,/[/d' $file > $file.aa

mv $file.aa $file

done

Answer 1

awk救援！

$ awk -F/ -v RS=^ -v ORS=' ' '{print $1}' file

Well John has a quite different not necessarily more elaborated theology

解释将记录分隔符（RS）设置为^以分隔您的逻辑组，同时将字段分隔符（FS）设置为/并将第一个字段打印为需求。最后，将输出字段分隔符（OFS）设置为空格（而不是默认的新行）会将提取的字段保留在同一行上。

Answer 2

使用GNU grep和Perl兼容的正则表达式（-P）：

$ echo $(grep -Po '(?<=/)[^[]*' infile)
Well John have a quite different not necessarily more elaborate theology

-o仅保留匹配项，(?<=/)是一个积极的后卫（＆＃34;确保有/，但不要将其包括在内匹配＆＃34;），[^[]*是＆＃34;除[＆＃34;以外的一系列字符。

grep -Po每行打印一个匹配项;通过使用grep的输出作为echo的参数，我们将换行符转换为空格（也可以通过管道传递到tr '\n' ' '）。

Answer 3

 cat file|grep -oE "\/[^\[]*\[" |sed -e 's#^/##' -e 's/\[$//' | tr -s "\n" " "

使用sed从文本文件中提取字符串

3 个答案: