我正在尝试输出一个字符串,其中包含字符串中两个单词之间的所有内容:
输入:
"Here is a String"
输出:
"is a"
使用:
sed -n '/Here/,/String/p'
包括端点,但我不想包含它们。
答案 0 :(得分:141)
GNU grep也可以支持积极的&负面预测&回望: 对于您的情况,命令将是:
echo "Here is a string" | grep -o -P '(?<=Here).*(?=string)'
如果多次出现Here
和string
,您可以选择是否要匹配来自第一个Here
和最后string
的匹配项,或者单独匹配它们。就正则表达而言,它被称为greedy match (first case)或non-greedy match (second case)
$ echo 'Here is a string, and Here is another string.' | grep -oP '(?<=Here).*(?=string)' # Greedy match
is a string, and Here is another
$ echo 'Here is a string, and Here is another string.' | grep -oP '(?<=Here).*?(?=string)' # Non-greedy match (Notice the '?' after '*' in .*)
is a
is another
答案 1 :(得分:83)
sed -e 's/Here\(.*\)String/\1/'
答案 2 :(得分:31)
您可以仅在Bash中删除字符串:
$ foo="Here is a String"
$ foo=${foo##*Here }
$ echo "$foo"
is a String
$ foo=${foo%% String*}
$ echo "$foo"
is a
$
如果你有一个包含PCRE的GNU grep,你可以使用零宽度断言:
$ echo "Here is a String" | grep -Po '(?<=(Here )).*(?= String)'
is a
答案 3 :(得分:20)
通过GNU awk,
$ echo "Here is a string" | awk -v FS="(Here|string)" '{print $2}'
is a
grep with -P
( perl-regexp )参数支持\K
,这有助于丢弃之前匹配的字符。在我们的例子中,先前匹配的字符串是Here
,因此它从最终输出中被丢弃。
$ echo "Here is a string" | grep -oP 'Here\K.*(?=string)'
is a
$ echo "Here is a string" | grep -oP 'Here\K(?:(?!string).)*'
is a
如果您希望输出为is a
,那么您可以尝试以下内容,
$ echo "Here is a string" | grep -oP 'Here\s*\K.*(?=\s+string)'
is a
$ echo "Here is a string" | grep -oP 'Here\s*\K(?:(?!\s+string).)*'
is a
答案 4 :(得分:18)
如果您有一个包含多行多行的长文件,首先打印数字行很有用:
cat -n file | sed -n '/Here/,/String/p'
答案 5 :(得分:8)
这可能适合你(GNU sed):
sed '/Here/!d;s//&\n/;s/.*\n//;:a;/String/bb;$!{n;ba};:b;s//\n&/;P;D' file
这会在换行符上显示两个标记之间的文本表示(在本例中为Here
和String
),并保留文本中的换行符。
答案 6 :(得分:6)
您可以使用两个s命令
$ echo "Here is a String" | sed 's/.*Here//; s/String.*//'
is a
也可以
$ echo "Here is a StringHere is a String" | sed 's/.*Here//; s/String.*//'
is a
$ echo "Here is a StringHere is a StringHere is a StringHere is a String" | sed 's/.*Here//; s/String.*//'
is a
答案 7 :(得分:5)
所有上述解决方案都存在缺陷,其中最后一个搜索字符串在字符串中的其他位置重复。我发现最好编写一个bash函数。
<a class="item item-avatar" ng-click="loadProfile({{student.pi_id}})" ng-repeat="student in filteredStudents" >...</a>
答案 8 :(得分:3)
您可以使用\1
(请参阅http://www.grymoire.com/Unix/Sed.html#uh-4):
echo "Hello is a String" | sed 's/Hello\(.*\)String/\1/g'
括号内的内容将存储为\1
。
答案 9 :(得分:2)
要了解sed
命令,我们必须逐步构建它。
这是您的原文
user@linux:~$ echo "Here is a String"
Here is a String
user@linux:~$
让我们尝试使用Here
中的s
原始选项删除sed
user@linux:~$ echo "Here is a String" | sed 's/Here //'
is a String
user@linux:~$
在这一点上,我相信您也可以删除String
user@linux:~$ echo "Here is a String" | sed 's/String//'
Here is a
user@linux:~$
但这不是您想要的输出。
要结合两个sed命令,请使用-e
选项
user@linux:~$ echo "Here is a String" | sed -e 's/Here //' -e 's/String//'
is a
user@linux:~$
希望这会有所帮助
答案 10 :(得分:0)
问题。我存储的Claws Mail邮件包含如下,我正在尝试提取主题行:
Subject: [SLC38A9 lysosomal arginine sensor; mTORC1 pathway] Key molecular
link in major cell growth pathway: Findings point to new potential
therapeutic target in pancreatic cancer [mTORC1 Activator SLC38A9 Is
Required to Efflux Essential Amino Acids from Lysosomes and Use Protein as
a Nutrient] [Re: Nutrient sensor in key growth-regulating metabolic pathway
identified [Lysosomal amino acid transporter SLC38A9 signals arginine
sufficiency to mTORC1]]
Message-ID: <20171019190902.18741771@VictoriasJourney.com>
此线程中的每个A2,How to use sed/grep to extract text between two words?下面的第一个表达式,&#34;工作&#34;只要匹配的文本不包含换行符:
grep -o -P '(?<=Subject: ).*(?=molecular)' corpus/01
[SLC38A9 lysosomal arginine sensor; mTORC1 pathway] Key
然而,尽管尝试了多种变体(.+?; /s; ...
),但我无法使用这些变体:
grep -o -P '(?<=Subject: ).*(?=link)' corpus/01
grep -o -P '(?<=Subject: ).*(?=therapeutic)' corpus/01
etc.
解决方案1。
每Extract text between two strings on different lines
sed -n '/Subject: /{:a;N;/Message-ID:/!ba; s/\n/ /g; s/\s\s*/ /g; s/.*Subject: \|Message-ID:.*//g;p}' corpus/01
给出了
[SLC38A9 lysosomal arginine sensor; mTORC1 pathway] Key molecular link in major cell growth pathway: Findings point to new potential therapeutic target in pancreatic cancer [mTORC1 Activator SLC38A9 Is Required to Efflux Essential Amino Acids from Lysosomes and Use Protein as a Nutrient] [Re: Nutrient sensor in key growth-regulating metabolic pathway identified [Lysosomal amino acid transporter SLC38A9 signals arginine sufficiency to mTORC1]]
解决方案2。*
每How can I replace a newline (\n) using sed?
sed ':a;N;$!ba;s/\n/ /g' corpus/01
将用空格替换换行符。
使用How to use sed/grep to extract text between two words?中的A2链接,我们得到:
sed ':a;N;$!ba;s/\n/ /g' corpus/01 | grep -o -P '(?<=Subject: ).*(?=Message-ID:)'
给出了
[SLC38A9 lysosomal arginine sensor; mTORC1 pathway] Key molecular link in major cell growth pathway: Findings point to new potential therapeutic target in pancreatic cancer [mTORC1 Activator SLC38A9 Is Required to Efflux Essential Amino Acids from Lysosomes and Use Protein as a Nutrient] [Re: Nutrient sensor in key growth-regulating metabolic pathway identified [Lysosomal amino acid transporter SLC38A9 signals arginine sufficiency to mTORC1]]
此变体删除了双重空格:
sed ':a;N;$!ba;s/\n/ /g; s/\s\s*/ /g' corpus/01 | grep -o -P '(?<=Subject: ).*(?=Message-ID:)'
给
[SLC38A9 lysosomal arginine sensor; mTORC1 pathway] Key molecular link in major cell growth pathway: Findings point to new potential therapeutic target in pancreatic cancer [mTORC1 Activator SLC38A9 Is Required to Efflux Essential Amino Acids from Lysosomes and Use Protein as a Nutrient] [Re: Nutrient sensor in key growth-regulating metabolic pathway identified [Lysosomal amino acid transporter SLC38A9 signals arginine sufficiency to mTORC1]]