是否可以在HTML类型提供程序中对表格进行计数?

时间:2018-10-29 01:47:03

标签: f# type-providers f#-data

我有一个Wiki页面,出于特定原因,我有兴趣在那里统计表格。

显然,道具Tablesopen System open FSharp.Data open FSharp.Data.Runtime type Wiki = HtmlProvider<"https://en.wikipedia.org/wiki/F_Sharp_(programming_language)"> let getTablesCount (url : string) = let data = Wiki.Load url let tables = data.Tables // won't compile - type constraint mismatch // let attempt1 = tables :> Map<string, HtmlTable> |> Map.count // won't compile - type is not compatible // let attempt2 = tables |> Seq.cast<Tuple<string, HtmlTable>> |> Seq.length // compiles - throws in the runtime InvalidCastException // let attempt3 = (box tables) :?> Map<string, HtmlTable> |> Map.count 42 的内部被表示为序列: enter image description here

有没有办法在代码中检索这些计数?

我尝试了几种可怕的技巧:

type
  TForm1 = class(TForm)
    Label1: TLabel;
    procedure FormFocusChanged(Sender: TObject);
  private
    FFocusedFrame: TFrame;
  public
    { Public declarations }
  end;

没有任何效果,可能永远是好的。也许我缺少明显的东西?

我准备使用正则表达式解析html ,例如FSharp.Data HTML解析器为此,只是想确定一下。

1 个答案:

答案 0 :(得分:2)

我对HtmlProvider不太熟悉,我猜您可以使用反射功能,也可以使用非公开类型(相当hacky),也可以使用HtmlAgilityPack

在HtmlProvider中搜索“表”节点后,我的计数为10:

enter image description here

open FSharp.Data

type Wiki = HtmlProvider<"https://en.wikipedia.org/wiki/F_Sharp_(programming_language)">

[<EntryPoint>]
let main argv = 

    let getTablesCount (url : string) =
        let data = Wiki.Load url
        let tables = data.Tables
        let props = tables.Html.Descendants("table") 
        props |> Seq.length |> (printfn "%A %A" "Table count is:")


    getTablesCount("https://en.wikipedia.org/wiki/F_Sharp_(programming_language)")    
    0

enter image description here