Delphi html解析检查元素是否具有属性

时间:2015-12-03 17:43:27

标签: delphi html-parsing mshtml

我有以下程序:

class procedure ParseData(AData: string; var ATextList: TList<string>);
var
  HTMLDoc: OleVariant;
  HTMLElement: OleVariant;
  I: Integer;
begin
  HTMLDoc := coHTMLDocument.Create as IHTMLDocument2;
  HTMLDoc.Write(AData);
  HTMLDoc.Close;

  for I := 0 to HTMLDoc.body.all.length - 1 do
  begin
    HTMLElement := HTMLDoc.body.all.item(I);

    if HTMLElement.hasAttribute('attr1') then
      ATextList.Add(HTMLElement.innerHTML);
  end;
end;

问题是hasAttribute无效。 setattributeinnerHTMLtagName等功能和广告可以正常运行。是否有另一种方法来检查元素是否包含给定属性?

2 个答案:

答案 0 :(得分:3)

您可以测试:

if not VarIsNull(HTMLElement.getAttribute('attr1')) then
  ATextList.Add(HTMLElement.innerHTML);

修改

<{>} hasAttributeIHTMLElement5接口中实现 - 它需要IE8及更高版本,并且在IE7标准模式或IE5(Quirks)模式下不受支持。

我导入C:\Windows\System32\mshtml.tlb(使用tlibimp工具),此代码有效:

if (IDispatch(HTMLElement) as IHTMLElement5).hasAttribute('attr1') then...

答案 1 :(得分:2)

您可以检查具体的命名属性,如下所示:

[DllImport("coredll.dll")]
static extern int GetGestureInfo(IntPtr hGestureInfo, [In, Out] IntPtr pGestureInfo);
// ...
GESTUREINFO gi = new GESTUREINFO();
gi.cbSize = 48;

IntPtr outGI = Marshal.AllocHGlobal(48);
Marshal.StructureToPtr(gi, outGI, false);

bool bResult = (GetGestureInfo(lParam, outGI) == 1);
bool bHandled = false;
Marshal.FreeHGlobal(outGI);
Marshal.PtrToStructure(outGI, gi);
// ...

我建议使用您自己的HasAttribute函数的原因是MSHTML解析器存在“&#39;值”问题。我对

的回答中描述的节点属性

Checking whether there are <input> object attribute values in the HTML Code using Delphi

使用我包含的HTML:

function HasAttribute(ANode : IHtmlDomNode; const AttrName : String) : Boolean;
var
  Attrs : IHtmlAttributeCollection;
  A : IHtmlDomAttribute;
  V : OleVariant;
  i : Integer;
begin
  Result := ANode.nodeType = 1;
  if not Result then
    Exit;
  Attrs := IDispatch(ANode.Attributes) as IHtmlAttributeCollection;
  for i := 0 to Attrs.length - 1 do begin
    V := i;
    A := IDispatch(Attrs.item(V)) as IHtmlDomAttribute;
    if CompareText(AttrName, A.nodeName) = 0 then
      exit;
  end;
  Result := False;
end;

procedure TForm1.btnTestAttributesClick(Sender: TObject);
var
  D : IHtmlDomNode;
  AttrName : String;
  Msg : String;
begin
  D := IDispatch(WebBrowser1.OleObject.Document.GetElementByID('input1')) as  IHtmlDomNode;
  AttrName := 'attr1';
  if HasAttribute(D, AttrName) then
    Msg := 'Found'
  else
    Msg := 'Not found';
  Memo1.Lines.Add(AttrName + ' : ' + Msg);

  AttrName := 'value';
  if HasAttribute(D, AttrName) then
    Msg := 'Found'
  else
    Msg := 'Not found';
  Memo1.Lines.Add(AttrName + ' : ' + Msg);
end;

您会发现DumpItems例程报告IHtmlAttributeCollection包含名为&#39; value&#39;的节点。是否在HTML的源代码中存在具有该名称的属性。参见例如第一个输入节点的结果。好像DOM解析器合成了一个&#39;值&#39;节点,如果没有在节点的HTML中定义。

代码的DumpItems报告示例HTML的以下内容:

<html>
  <body>
    <p>This has no value attribute.
    <input name="input1" type="text"/>
    <p>This has an empty value attribute.
    <input name="input2" type="text" value=""/>
    <p>This has a value attribute.
    <input name="input3" type="text" value="already has a value"/>
  </body>
</html>

顺便说一下,当我第一次运行我的测试应用程序时,报告的属性节点编号(147,158,160)让我困惑,但原因是每个IHtmlDomNode都有一大堆属性,主要是事件处理程序,启动与Node name: INPUT value: 147: type: >text< 158: value: >< 160: name: >input1< Node name: INPUT value: 147: type: >text< 158: value: >< 160: name: >input2< Node name: INPUT value: 147: type: >text< 158: value: >already has a value< 160: name: >input3<

为了节省必须查看其他答案,其DumpItems的代码是

onchange