Question

我正在尝试使用Delphi解析如何解析Google搜索结果的排名，标题和网址。

主要是我需要从具有特定类名“r”的H3标签中获取所有A链接和TEXT。

以下是获取Google html结果部分的功能：

function TForm1.ExtractContainer: TStringList;
var
    Doc : IHTMLDocument3;
    i: Integer;
    Download: IHTMLElement;
    Coll: IHTMLElementCollection;
    Anchor: IHTMLAnchorElement;
    tmp : String;

begin
    Result := TStringList.Create;
    Doc := EmbeddedWB1.Document as IHTMLDocument3;
    Download := Doc.getElementById('center_col') as IHTMLElement;
    tmp := Download.innerHTML;
    result.Text := AnsiReplaceStr(tmp, '<h3 class="r">', '<h3 class="r">'#13#10);

for i := 1 to result.Count -1 do
begin
    tmp := ExtractTextBetween (result[i], 'href="','">');
    memo1.Lines.Add(tmp);
end;

您可以在div center_col中看到所有Google搜索结果。现在我需要做一些寻找来自具有特定类名“r”的H3标签的所有A链接和TEXT。

希望有人可以帮助我！

Answer 1

根据以下建议，我改变了答案：

要解析HTML，最有效的方法是使用基于DOM的HTML解析器。进行了快速搜索：http://www.yunqa.de/delphi/doku.php/products/htmlparser/index

从主页面： “HTML-Tags：HTML-Tags很容易解析为Name，Attributes和Values.DIHtmlParser识别开始标签，结束标签和空元素标签。例如：。”

这个产品并不是唯一的产品，但我已经在其他一些SO帖子中看到了它。

希望这有帮助

使用Delphi解析Google结果

1 个答案: