Question

使用正则表达式提取html标记中的文本时遇到问题。

我想从以下html代码中提取文本。

<a href="http://google.com/" target="_self" title="TEXTDATA" class="encyclopedia">Google</a>

结果：

TEXTDATA

我只想提取文本TEXTDATA

我已经尝试过，但是没有成功。

Answer 1

{ // See https://go.microsoft.com/fwlink/?LinkId=733558 // for the documentation about the tasks.json format "version": "2.0.0", { "label": "GnuCOBOL - Compile (single file)", "type": "shell", "options": { "env": { "PATH":"\\gnucobol3\\bin", "COB_CONFIG_DIR": "c:\\gnucobol3\\config", "COB_COPY_DIR": "c:\\gnucobol3\\copy", "COB_INCLUDE_PATH": "c:\\gnucobol3\\include", "COB_LIB_PATH": "c:\\gnucobol3\\lib", }, "tasks": [ {"command": "cobc", "args": [ "-x", "-std=mf", "-t${fileBasenameNoExtension}.LST", "${file}" ] } ] } } }

删除标题并尝试

Answer 2

使用此正则表达式：

title=\"([^\"]*)\"

请参阅： Regex

Answer 3

在这里，我们想将字符串向上滑动到左边界，然后收集所需的数据，然后继续滑动到字符串的末尾（如果我们愿意）

<.+title="(.+?)"(.*)

const regex = /<.+title="(.+?)"(.*)/gm;
const str = `<a href="http://google.com/" target="_self" title="TEXTDATA" class="encyclopedia">Google</a>`;
const subst = `$1`;

// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);

console.log('Substitution result: ', result);

RegEx

如果不需要此表达式，可以在regex101.com中对其进行修改或更改。

RegEx电路

jex.im还有助于可视化表达式。

PHP

$re = '/<.+title="(.+?)"(.*)/m';
$str = '<a href="http://google.com/" target="_self" title="TEXTDATA" class="encyclopedia">Google</a>';
$subst = '$1';

$result = preg_replace($re, $subst, $str);

echo $result;

正则表达式，用于捕获HTML元素中的属性值

3 个答案:

RegEx

RegEx电路

PHP