美好的一天!我正在使用Delphi XE和Indy TIdHTTP。使用Get方法我得到远程目录列表,我需要解析它=获取文件列表及其大小和时间戳,并区分文件和子目录。拜托,有一个很好的例程吗?先感谢您! Vojtech
以下是样本:
<head>
<title>127.0.0.1 - /</title>
</head>
<body>
<H1>127.0.0.1 - /</H1><hr>
<pre>
Mittwoch, 30. März 2011 12:01 <dir> <A HREF="/SubDir/">SubDir</A><br />
Mittwoch, 9. Februar 2005 17:14 113 <A HREF="/file.txt">file.txt</A><br />
</pre>
<hr>
</body>
答案 0 :(得分:7)
鉴于代码示例,我想解析它的最快方式将是这样的:
<pre>...</pre>
块。应该很容易。<pre>
和</pre>
之间的所有内容放入TStringList
。每一行都是一个文件或文件夹,格式非常简单。答案 1 :(得分:7)
这应该为您提供一个良好的开端和使用DOM的想法:
uses
MSHTML,
ActiveX,
ComObj;
procedure DocumentFromString(Document: IHTMLDocument2; const S: WideString);
var
v: OleVariant;
begin
v := VarArrayCreate([0, 0], varVariant);
v[0] := S;
Document.Write(PSafeArray(TVarData(v).VArray));
Document.Close;
end;
function StripMultipleChar(const S: string; const C: Char): string;
begin
Result := S;
while Pos(C + C, Result) <> 0 do
Result := StringReplace(Result, C + C, C, [rfReplaceAll]);
end;
procedure TForm1.Button1Click(Sender: TObject);
var
Document: IHTMLDocument2;
Elements: IHTMLElementCollection;
Element: IHTMLElement;
I: Integer;
Line: string;
begin
Document := CreateComObject(CLASS_HTMLDocument) as IHTMLDocument2;
DocumentFromString(Document, '<head>...'); // your HTML here
Elements := Document.all.tags('A') as IHTMLElementCollection;
for I := 0 to Elements.length - 1 do
begin
Element := Elements.item(I, '') as IHTMLElement;
Memo1.Lines.Add('A HREF=' + Element.getAttribute('HREF', 2));
Memo1.Lines.Add('A innerText=' + Element.innerText);
// Text is returned immediately before the element
Line := (Element as IHTMLElement2).getAdjacentText('beforeBegin');
// Line => "Mittwoch, 30. März 2011 12:01 <dir>" OR:
// Line => "Mittwoch, 9. Februar 2005 17:14 113"...
// I don't know what is the actual delimiter:
// It could be [space] or [tab] so we need to normalize the Line
// If it's tabs then it's easier because the timestamps also contains spaces
Line := Trim(Line);
Line := StripMultipleChar(Line, #32); // strip multiple Spaces sequences
Line := StripMultipleChar(Line, #9); // strip multiple Tabs sequences
// TODO: ParseLine (from right to left)
Memo1.Lines.Add(Line);
Memo1.Lines.Add('-------------');
end;
end;
输出:
A HREF=/SubDir/
A innerText=SubDir
Mittwoch, 30. März 2011 12:01 <dir>
-------------
A HREF=/file.txt
A innerText=file.txt
Mittwoch, 9. Februar 2005 17:14 113
-------------
修改强>
我已将StripMultipleChar
实施更改为简化。但我相信前一版本更加优化以提高速度。考虑到线条长度非常短的事实,性能没有太大差异。