<vboxview leftinset="10" rightinset="0" stretchiness="1">    // CONTENT INSIDE HERE </vboxview>


如果你知道标签中没有任何其他HTML / XML元素,那么这将很有效:



<vboxview                  # match `<vboxview` literally
\s+                        # match at least one whitespace character
(?P<vboxviewAttributes>    # begin capture (into a group named "vboxViewAttributes")
   (\\>|[^>])*             #    any number of (either `\>` or NOT `>`)
)                          # end capture
>                          # match a `>` character
(?P<vboxviewContent>       # begin capture (into a group named "vboxViewContent")
   (\\<|[^<])*             #    any number of (either `\<` or NOT `<`)
)                          # end capture
</vboxview>                # match `</vboxview>` literally

您需要在源代码中转义>字符\>,或者更好地转换为HTML / XML实体

如果里面有嵌套的构造,那么你要么去start running into problems with regex,要么你已经决定使用另一种不涉及正则表达式的方法 - 这两种方法都足够了!

正如评论中提到的那样,尝试使用正则表达式从HTML中提取内容通常不是一个好主意。如果您想切换到更加防弹的方法,可以使用DOMDocument API轻松提取信息。

function get_vboxview($html) {

    $output = array();

    // Create a new DOM object
    $doc = new DOMDocument;

    // load a string in as html

    // create a new Xpath object to query the document with
    $xpath = new DOMXPath($doc);

    // an xpath query that looks for a vboxview node anywhere in the DOM
    // with an attribute named leftinset set to 10, an attribute named rightinset
    // set to 0 and an attribute named stretchiness set to 1
    $query = '//vboxview[@leftinset=10 and @rightinset=0 and @stretchiness=1]';

    // query the document
    $matches = $xpath->query($query);

    // loop through each matching node
    // and the textContent to the output
    foreach ($matches as $m) {
            $output[] = $m->textContent;

    return $output;


function get_node_text($html, $id) {
    // Create a new DOM object
    $doc = new DOMDocument;

    // load a string in as html

    // return the textContent of the node with the id $id
    return $doc->getElementById($id)->textContent;