用wget刮掉ASPX网页表单?

时间:2012-08-12 17:38:21

标签: asp.net wget

对于数据收集/分析项目,我试图在http://www.lasuperiorcourt.org/civilcasesummarynet/ui/?CT=AP&casetype=appellate下载aspx网页表单中的条目,但到目前为止我收效甚微。

想法是通过wget从网页下载相关信息,并将结果输出到单个html文件。然后,根据结果输出,我将在相关案例的提取数据上编制统计数据(例如从案例号BV024000到BV028933)。

但是,我无法从表单中检索数据。我一直在用:

wget --post-data "frmsearch=BV024000" http://www.lasuperiorcourt.org/civilcasesummarynet/ui/?CT=AP^&casetype=appellate -O output.html

但我只是回到原始页面,而不是表单输出。我做错了什么?

2 个答案:

答案 0 :(得分:0)

有两个问题

  1. 你的命令中有拼写错误 - 你应该在引号和它的ui / index.aspx?CT = AP之间包装http地址。没有^
  2. 当您发布表单时,您必须发布表单的所有输入字段,否则您的帖子请求不会被验证。
  3. 我在这里提出了以下请求

        wget --post-data "__VIEWSTATE=%2FwEPDwUJMzM0NzAxOTczD2QWBgIBD2QWCmYPDxYCHgdWaXNpYmxlZ2RkAgIPDxYCHwBoZGQCBA8PFgIfAGhkZAIGDw8WAh8AaGRkAggPDxYCHwBoZGQCAw9kFgpmDw8WAh8AZ2RkAgIPDxYCHwBoZGQCBA8PFgIfAGhkZAIGDw8WAh8AaGRkAggPDxYCHwBoZGQCCQ9kFgICAw8PFgIfAGhkFgICAQ8QZA8WIGYCAQICAgMCBAIFAgYCBwIIAgkCCgILAgwCDQIOAg8CEAIRAhICEwIUAhUCFgIXAhgCGQIaAhsCHAIdAh4CHxYgEAUGU2VsZWN0BQZTZWxlY3RnEAUTQWxoYW1icmEgQ291cnRob3VzZQUDQUxIZxAFFUJlbGxmbG93ZXIgQ291cnRob3VzZQUDTEMgZxAFGEJldmVybHkgSGlsbHMgQ291cnRob3VzZQUDQkggZxAFEkJ1cmJhbmsgQ291cnRob3VzZQUDQlVSZxAFFUNoYXRzd29ydGggQ291cnRob3VzZQUDQ0hBZxAFEkNvbXB0b24gQ291cnRob3VzZQUDQ09NZxAFFkN1bHZlciBDaXR5IENvdXJ0aG91c2UFA0NDIGcQBRFEb3duZXkgQ291cnRob3VzZQUDRE9XZxAFG0Vhc3QgTG9zIEFuZ2VsZXMgQ291cnRob3VzZQUDRUxBZxAFE0VsIE1vbnRlIENvdXJ0aG91c2UFA0VMTWcQBRNHbGVuZGFsZSBDb3VydGhvdXNlBQNHTE5nEAUaSHVudGluZ3RvbiBQYXJrIENvdXJ0aG91c2UFA0hQIGcQBRRJbmdsZXdvb2QgQ291cnRob3VzZQUDSU5HZxAFFUxvbmcgQmVhY2ggQ291cnRob3VzZQUDTEIgZxAFEU1hbGlidSBDb3VydGhvdXNlBQNNQUxnEAUtTWljaGFlbCBBbnRvbm92aWNoIEFudGVsb3BlIFZhbGxleSBDb3VydGhvdXNlBQNBVFBnEAUTTW9ucm92aWEgQ291cnRob3VzZQUDU05JZxAFE1Bhc2FkZW5hIENvdXJ0aG91c2UFA1BBU2cQBRdQb21vbmEgQ291cnRob3VzZSBOb3J0aAUDUE9NZxAFGFJlZG9uZG8gQmVhY2ggQ291cnRob3VzZQUDU0JCZxAFF1NhbiBGZXJuYW5kbyBDb3VydGhvdXNlBQNMQVNnEAUUU2FuIFBlZHJvIENvdXJ0aG91c2UFA0xBUGcQBRhTYW50YSBDbGFyaXRhIENvdXJ0aG91c2UFA05FV2cQBRdTYW50YSBNb25pY2EgQ291cnRob3VzZQUDU00gZxAFFVNvdXRoIEdhdGUgQ291cnRob3VzZQUDU0cgZxAFF1N0YW5sZXkgTW9zayBDb3VydGhvdXNlBQNMQU1nEAUTVG9ycmFuY2UgQ291cnRob3VzZQUDU0JBZxAFGFZhbiBOdXlzIENvdXJ0aG91c2UgV2VzdAUDTEFWZxAFFldlc3QgQ292aW5hIENvdXJ0aG91c2UFA0NJVGcQBRtXZXN0IExvcyBBbmdlbGVzIENvdXJ0aG91c2UFA0xBV2cQBRNXaGl0dGllciBDb3VydGhvdXNlBQNXSCBnFgFmZGQk7ioHoNWuWLyRkeV2Jf7vbNorIw%3D%3D&CaseNumber=BV024000&submit1=Search&casetype=appellate" "http://www.lasuperiorcourt.org/civilcasesummarynet/ui/index.aspx?CT=AP&casetype=appellate" -O output.html
    --2012-08-12 19:25:32--  http://www.lasuperiorcourt.org/civilcasesummarynet/ui/index.aspx?CT=AP&casetype=appellate
    Resolving www.lasuperiorcourt.org... 153.43.255.56
    Connecting to www.lasuperiorcourt.org|153.43.255.56|:80... connected.
    HTTP request sent, awaiting response... 302 Found
    Location: /civilcasesummarynet/ui/casesummary.aspx?CT=AP&casetype=appellate [following]
    --2012-08-12 19:25:33--  http://www.lasuperiorcourt.org/civilcasesummarynet/ui/casesummary.aspx?CT=AP&casetype=appellate
    

    它的工作原理见http://i47.tinypic.com/35db8k3.png

    可能您需要为每个请求设置__VIEWSTATE的新值。

答案 1 :(得分:-1)

您在执行此命令的环境中?在大多数unix shell中,“&”是一个特殊的字符,它会终止命令字符串并在执行时将命令发送到后台。,但你没有以任何方式引用该URL

编辑:好的,没关系......我的回答并不是那么有用,除了我不知道“^”是一个引用字符,现在我知道了。 http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/ntcmds_shelloverview.mspx?mfr=true