我想从http://mbsweblist.fsco.gov.on.ca/ShowLicence.aspx?M13000248~提取代理/经纪人名称,许可证编号和有效期
“ M”之后的数字是许可证编号。 我具有强大的查询功能,可以拉取几个许可证的数据。如何提取列表= {00000000..99999999}的数据? PowerBI不适合此目的吗?还有其他方法吗?
谢谢,感谢您的帮助。
(page as number) as table =>
let
Source = Web.Page(Web.Contents("http://mbsweblist.fsco.gov.on.ca/ShowLicence.aspx?M"&Number.ToText(page)&"~")),
Data1 = Source{1}[Data],
#"Changed Type" = Table.TransformColumnTypes(Data1,{{"Column1", type text}, {"Column2", type text}}),
#"Filtered Rows" = Table.SelectRows(#"Changed Type", each ([Column1] = "Agent/Broker Name:" or [Column1] = "Expiry Date:" or [Column1] = "Licence #:"))
in
#"Filtered Rows"
let
Source = {18001928,13000248},
#"Converted to Table" = Table.FromList(Source, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
#"Renamed Columns" = Table.RenameColumns(#"Converted to Table",{{"Column1", "Page"}}),
#"Added Custom" = Table.AddColumn(#"Renamed Columns", "Custom", each GetData([Page])),
#"Expanded Custom" = Table.ExpandTableColumn(#"Added Custom", "Custom", {"Column1", "Column2"}, {"Custom.Column1", "Custom.Column2"})
in
#"Expanded Custom"
答案 0 :(得分:1)
首先,如果要尝试刮取“超过一百万个页面”,我建议您谨慎-可以肯定的是,Web服务器会将重复的请求视为违反其服务条款/某种形式的攻击。
但是,从技术能力的角度来回答问题-您列出许可证号,然后将许可证号传递给函数以获取Web数据的方法几乎是正确的。不过,您的行使并不十分正确。
第1步:创建一个函数,该函数以所需格式为一个URL提取所需数据,该URL是通过传递许可证号作为参数而生成的。我将此函数命名为WebData:
(LicenceNumber) =>
let
Source = Web.Page(Web.Contents("http://mbsweblist.fsco.gov.on.ca/ShowLicence.aspx?M" & Number.ToText(LicenceNumber) & "~")),
WebData = Source{1}[Data],
#"Extracted Text Before Delimiter" = Table.TransformColumns(WebData, {{"Column1", each Text.BeforeDelimiter(_, ":"), type text}}),
#"Removed Top Rows" = Table.Skip(#"Extracted Text Before Delimiter",1),
#"Transposed Table" = Table.Transpose(#"Removed Top Rows"),
#"Promoted Headers" = Table.PromoteHeaders(#"Transposed Table", [PromoteAllScalars=true])
in
#"Promoted Headers"
现在创建第二个查询,列出要检索其数据的许可证号,然后使用WebData函数检索每个页面数据,最后将这些数据合并到一个表中:
let
Source = {13000246..13000250},
#"Convert to Table" = Table.FromList(Source,Splitter.SplitByNothing(),{"Licence Number"}),
#"Changed Type" = Table.TransformColumnTypes(#"Convert to Table",{{"Licence Number", Int64.Type}}),
#"Get WebData" = Table.AddColumn(#"Changed Type", "WebData", each try WebData([Licence Number]) otherwise #table({},{})),
#"Combine WebData" = Table.Combine(#"Get WebData"[WebData]),
#"Changed Types" = Table.TransformColumnTypes(#"Combine WebData",{{"Agent/Broker Name", type text}, {"Licence #", type text}, {"Brokerage Name", type text}, {"Licence Class", type text}, {"Status", type text}, {"Issue Date", type date}, {"Expiry Date", type date}, {"Inactive Date", type date}})
in
#"Changed Types"
请注意,源代码行的开始和结束值确定了所用列表的范围。