Select text beetween two strings using regex

时间:2019-04-17 00:54:14

标签: html regex linux bash shell

I want to extract a substring from a string, using regex and sed/egrep. For example, I want to extract the title from this html tag:

<title>DuckDuckGo — Privacy, simplified.</title>

and of course, I don't want the tags themselfs.

The output should look like this:

DuckDuckGo — Privacy, simplified.

If possible, I want to do this with one command from the linux terminal.

I've gotten this far:

wget -qO- www.duckduckgo.com | grep '<title>' | grep '</title>'

To get the line containing the title tags and:

/(?:>).+(?:<)/g

To, according to this website, should output:

>DuckDuckGo — Privacy, simplified.<

When I run it in the terminal, it doesn't. Is there a way to extract the inner html with one command (sed or egrep if possible)?

EDIT:

This is not specificly about extracting text from two html tags but rather about selecting text beetween two given strings without outputting the strings used to select and without using extra software. The suggested question has no answer that solves my problem.

0 个答案:

没有答案