Question

我正在寻找一种方法来通过我的网页通过DTD验证器作为我的WatiN测试的一部分，但我还没有找到一种干净的方式来访问原始HTML。有内置的方法吗？

我想我可以访问IE.InternetExplorer接口的QueryInterface和IPersistStreamInit属性，并将文档序列化为IStream，但似乎需要做很多工作我认为必须是一项相当普遍的任务。

我错过了WatiN中明显的东西吗？或者有人会想到比我上面概述的解决方案更好的解决方案吗？毕竟，该解决方案非常具有IE特性。

Answer 1

以下是访问源代码的方法：

browser.ActiveElement.Parent.OuterHtml

Answer 2

string html = browser.Body.Parent.OuterHtml;

Answer 3

似乎没有更好的方法。我提交了feature request并在WatiN的sourceforge跟踪器上提交了补丁。

Answer 4

考虑放弃一些行，以帮助任何人在那里努力通过WatiN获取网页的原始HTML源代码，而不是修补WatiN - 只是为了品味。

所以利用约翰莱文的补丁我将以下内容联系在一起。安全，希望你觉得它很有用。

    private static TextVariant GetWebPageSource(IE browser)
    {
    IHTMLDocument2 htmlDocument = ((IEDocument)(browser.DomContainer.NativeDocument)).HtmlDocument;
    Encoding encoding = Encoding.GetEncoding(htmlDocument.charset);
        IPersistStreamInit persistStream = (IPersistStreamInit)htmlDocument;
        MinimalIStream stream = new MinimalIStream();
        persistStream.Save(stream, false);
        return new TextVariant(encoding.GetString(stream.ToArray()));
    }

    [Guid("7FD52380-4E07-101B-AE2D-08002B2EC713")]
    [InterfaceTypeAttribute(ComInterfaceType.InterfaceIsIUnknown)]
    public interface IPersistStreamInit
    {
        void GetClassID(out Guid pClassID);
        int IsDirty();
        void Load(IStream pStm);
        void Save(IStream pStm, bool fClearDirty);
        void GetSizeMax(out long pcbSize);
        void InitNew();
    }

    // http://stackoverflow.com/questions/6601355/passing-an-fstream-or-equivalent-from-c-to-c-through-cli
    [ClassInterface(ClassInterfaceType.AutoDispatch)]
    public class MinimalIStream : MemoryStream, IStream
    {
        public MinimalIStream() { }

        public MinimalIStream(byte[] data) : base(data) { }

        #region IStream Members
        public void Write(byte[] pv, int cb, IntPtr pcbWritten)
        {
            base.Write(pv, 0, cb);
            if (pcbWritten != IntPtr.Zero)
                Marshal.WriteInt64(pcbWritten, (long)cb);
        }

        public void Stat(out STATSTG pstatstg, int grfStatFlag)
        {
            pstatstg = new STATSTG();
            pstatstg.cbSize = base.Length;
        }

        public void Read(byte[] pv, int cb, IntPtr pcbRead)
        {
            long bytes_read = base.Read(pv, 0, cb);
            if (pcbRead != IntPtr.Zero) Marshal.WriteInt64(pcbRead, bytes_read);
        }

        public void Seek(long dlibMove, int dwOrigin, IntPtr plibNewPosition)
        {
            long pos = base.Seek(dlibMove, (SeekOrigin)dwOrigin);
            if (plibNewPosition != IntPtr.Zero) Marshal.WriteInt64(plibNewPosition, pos);
        }

        public void Clone(out IStream ppstm)
        {
            ppstm = null;
        }

        public void Commit(int grfCommitFlags)
        {
        }

        public void CopyTo(IStream pstm, long cb, IntPtr pcbRead, IntPtr pcbWritten)
        {
        }

        public void LockRegion(long libOffset, long cb, int dwLockType)
        {
        }

        public void SetSize(long libNewSize)
        {
        }

        public void Revert()
        {
        }

        public void UnlockRegion(long libOffset, long cb, int dwLockType)
        {
        }
        #endregion
    }

Answer 5

我找到了：

browser.ActiveElement.Parent.OuterHtml

并不总能获得所有内容，因为这取决于你的'ActiveElement'，因此：

browser.Body.Parent.OuterHtml

似乎效果更好。（browser是您的IE实例）

虽然我认为Johan Levin说DOM被序列化为文本格式是正确的。因此，通过它的URL（不使用WatiN）来获取文档是不是更容易验证它。

访问WatiN中的完整页面源

5 个答案: