Question

我正在尝试输出一个字符串，其中包含字符串中两个单词之间的所有内容：

输入：

"Here is a String"

输出：

"is a"

使用：

sed -n '/Here/,/String/p'

包括端点，但我不想包含它们。

Answer 1

GNU grep也可以支持积极的＆amp;负面预测＆amp;回望：对于您的情况，命令将是：

echo "Here is a string" | grep -o -P '(?<=Here).*(?=string)'

如果多次出现Here和string，您可以选择是否要匹配来自第一个Here和最后string的匹配项，或者单独匹配它们。就正则表达而言，它被称为greedy match (first case)或non-greedy match (second case)

$ echo 'Here is a string, and Here is another string.' | grep -oP '(?<=Here).*(?=string)' # Greedy match
 is a string, and Here is another 
$ echo 'Here is a string, and Here is another string.' | grep -oP '(?<=Here).*?(?=string)' # Non-greedy match (Notice the '?' after '*' in .*)
 is a 
 is another

Answer 2

sed -e 's/Here\(.*\)String/\1/'

Answer 3

您可以仅在Bash中删除字符串：

$ foo="Here is a String"
$ foo=${foo##*Here }
$ echo "$foo"
is a String
$ foo=${foo%% String*}
$ echo "$foo"
is a
$

如果你有一个包含PCRE的GNU grep，你可以使用零宽度断言：

$ echo "Here is a String" | grep -Po '(?<=(Here )).*(?= String)'
is a

Answer 4

通过GNU awk，

$ echo "Here is a string" | awk -v FS="(Here|string)" '{print $2}'
 is a

grep with -P（ perl-regexp ）参数支持\K，这有助于丢弃之前匹配的字符。在我们的例子中，先前匹配的字符串是Here，因此它从最终输出中被丢弃。

$ echo "Here is a string" | grep -oP 'Here\K.*(?=string)'
 is a 
$ echo "Here is a string" | grep -oP 'Here\K(?:(?!string).)*'
 is a

如果您希望输出为is a，那么您可以尝试以下内容，

$ echo "Here is a string" | grep -oP 'Here\s*\K.*(?=\s+string)'
is a
$ echo "Here is a string" | grep -oP 'Here\s*\K(?:(?!\s+string).)*'
is a

Answer 5

如果您有一个包含多行多行的长文件，首先打印数字行很有用：

cat -n file | sed -n '/Here/,/String/p'

Answer 6

这可能适合你（GNU sed）：

sed '/Here/!d;s//&\n/;s/.*\n//;:a;/String/bb;$!{n;ba};:b;s//\n&/;P;D' file

这会在换行符上显示两个标记之间的文本表示（在本例中为Here和String），并保留文本中的换行符。

Answer 7

您可以使用两个s命令

$ echo "Here is a String" | sed 's/.*Here//; s/String.*//'
 is a

也可以

$ echo "Here is a StringHere is a String" | sed 's/.*Here//; s/String.*//'
 is a

$ echo "Here is a StringHere is a StringHere is a StringHere is a String" | sed 's/.*Here//; s/String.*//'
 is a

Answer 8

所有上述解决方案都存在缺陷，其中最后一个搜索字符串在字符串中的其他位置重复。我发现最好编写一个bash函数。

<a class="item item-avatar" ng-click="loadProfile({{student.pi_id}})" ng-repeat="student in filteredStudents" >...</a>

Answer 9

您可以使用\1（请参阅http://www.grymoire.com/Unix/Sed.html#uh-4）：

echo "Hello is a String" | sed 's/Hello\(.*\)String/\1/g'

括号内的内容将存储为\1。

Answer 10

要了解sed命令，我们必须逐步构建它。

这是您的原文

user@linux:~$ echo "Here is a String"
Here is a String
user@linux:~$

让我们尝试使用Here中的s原始选项删除sed

user@linux:~$ echo "Here is a String" | sed 's/Here //'
is a String
user@linux:~$

在这一点上，我相信您也可以删除String

user@linux:~$ echo "Here is a String" | sed 's/String//'
Here is a
user@linux:~$

但这不是您想要的输出。

要结合两个sed命令，请使用-e选项

user@linux:~$ echo "Here is a String" | sed -e 's/Here //' -e 's/String//'
is a
user@linux:~$

希望这会有所帮助

Answer 11

问题。我存储的Claws Mail邮件包含如下，我正在尝试提取主题行：

Subject: [SLC38A9 lysosomal arginine sensor; mTORC1 pathway] Key molecular
 link in major cell growth pathway: Findings point to new potential
 therapeutic target in pancreatic cancer [mTORC1 Activator SLC38A9 Is
 Required to Efflux Essential Amino Acids from Lysosomes and Use Protein as
 a Nutrient] [Re: Nutrient sensor in key growth-regulating metabolic pathway
 identified [Lysosomal amino acid transporter SLC38A9 signals arginine
 sufficiency to mTORC1]]
Message-ID: <20171019190902.18741771@VictoriasJourney.com>

此线程中的每个A2，How to use sed/grep to extract text between two words?下面的第一个表达式，＆＃34;工作＆＃34;只要匹配的文本不包含换行符：

grep -o -P '(?<=Subject: ).*(?=molecular)' corpus/01

[SLC38A9 lysosomal arginine sensor; mTORC1 pathway] Key

然而，尽管尝试了多种变体（.+?; /s; ...），但我无法使用这些变体：

grep -o -P '(?<=Subject: ).*(?=link)' corpus/01
grep -o -P '(?<=Subject: ).*(?=therapeutic)' corpus/01
etc.

解决方案1。

每Extract text between two strings on different lines

sed -n '/Subject: /{:a;N;/Message-ID:/!ba; s/\n/ /g; s/\s\s*/ /g; s/.*Subject: \|Message-ID:.*//g;p}' corpus/01

给出了

[SLC38A9 lysosomal arginine sensor; mTORC1 pathway] Key molecular link in major cell growth pathway: Findings point to new potential therapeutic target in pancreatic cancer [mTORC1 Activator SLC38A9 Is Required to Efflux Essential Amino Acids from Lysosomes and Use Protein as a Nutrient] [Re: Nutrient sensor in key growth-regulating metabolic pathway identified [Lysosomal amino acid transporter SLC38A9 signals arginine sufficiency to mTORC1]]

解决方案2。*

每How can I replace a newline (\n) using sed?

sed ':a;N;$!ba;s/\n/ /g' corpus/01

将用空格替换换行符。

使用How to use sed/grep to extract text between two words?中的A2链接，我们得到：

sed ':a;N;$!ba;s/\n/ /g' corpus/01 | grep -o -P '(?<=Subject: ).*(?=Message-ID:)'

给出了

[SLC38A9 lysosomal arginine sensor; mTORC1 pathway] Key molecular  link in major cell growth pathway: Findings point to new potential  therapeutic target in pancreatic cancer [mTORC1 Activator SLC38A9 Is  Required to Efflux Essential Amino Acids from Lysosomes and Use Protein as  a Nutrient] [Re: Nutrient sensor in key growth-regulating metabolic pathway  identified [Lysosomal amino acid transporter SLC38A9 signals arginine  sufficiency to mTORC1]]

此变体删除了双重空格：

sed ':a;N;$!ba;s/\n/ /g; s/\s\s*/ /g' corpus/01 | grep -o -P '(?<=Subject: ).*(?=Message-ID:)'

给

[SLC38A9 lysosomal arginine sensor; mTORC1 pathway] Key molecular link in major cell growth pathway: Findings point to new potential therapeutic target in pancreatic cancer [mTORC1 Activator SLC38A9 Is Required to Efflux Essential Amino Acids from Lysosomes and Use Protein as a Nutrient] [Re: Nutrient sensor in key growth-regulating metabolic pathway identified [Lysosomal amino acid transporter SLC38A9 signals arginine sufficiency to mTORC1]]

如何使用sed / grep在两个单词之间提取文本？

11 个答案: