我想发布一个表单并使用返回的数据。
页面我想得到的数据是: http://www.bigpara.com/analiz/mali-tablolar/
assetscrap <- function(sirket){
a <- postForm("http://www.bigpara.com/analiz/mali-tablolar/",
Yil = "2013", Donem = "4", Kur = "TL", Cins = "1", Submit = "Getir",
HisseKod = sirket);
a <- htmlParse(a);
span <- xpathSApply(a, "//div[@class='maliTable']//li//span", xmlValue);
small <- xpathSApply(a, "//div[@class='maliTable']//li//small", xmlValue);
small <- gsub("[.]","",small);
small <- as.numeric(small);
cikti <- data.table(span, small);
cikti <- cikti[cikti$span == "AKTİF TOPLAMI" | cikti$span == "A K T İ F T O P L A M I"];
cikti <- cikti[order(-small)];
cikti <- cikti[1,]$small;
}
代表。当我运行assetscrap("FROTO")
函数时,它返回
* About to connect() to www.bigpara.com port 80 (#0)
* Trying 83.66.15.71... * connected
* Connected to www.bigpara.com (83.66.15.71) port 80 (#0)
> POST /analiz/mali-tablolar/ HTTP/1.1
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.71 Safari/537.36
Host: www.bigpara.com
Accept: */*
Referer: http://www.bigpara.com/analiz/mali-tablolar/
Content-Length: 627
Expect: 100-continue
Content-Type: multipart/form-data; boundary=----------------------------b1006fa82edf
< HTTP/1.1 100 Continue
< HTTP/1.1 200 OK
< Cache-Control: private
< Content-Length: 182029
< Content-Type: text/html; Charset=UTF-8
< Server: Microsoft-IIS/7.5
< Set-Cookie: ASPSESSIONIDCCTSBQAT=HOOCGCIBDPNEJMFGGFGGHNPM; path=/
< X-Powered-By: ASP.NET
< Date: Sat, 06 Dec 2014 14:00:12 GMT
< Set-Cookie: NSC_cjhqbsb_iuuq_WJQ=ffffffff504a9f5645525d5f4f58455e445a4a42367f;Version=1;path=/;httponly
<
* Connection #0 to host www.bigpara.com left intact
我忽视的是什么?我认为我正确地做了一切,但服务器没有回复我的请求
答案 0 :(得分:1)
为什么说服务器没有响应?您获得状态200(OK),响应长度为182,000字节??
POST请求正常。你的问题在于:
cikti <- cikti[cikti$span == "AKTİF TOPLAMI" | cikti$span == "A K T İ F T O P L A M I"];
返回0行。这里有几个错误:
首先,span
列中的文字具有混合编码:
head(Encoding(span),20)
# [1] "UTF-8" "UTF-8" "UTF-8" "UTF-8" "unknown" "unknown" "UTF-8" "UTF-8"
# [9] "UTF-8" "UTF-8" "UTF-8" "unknown" "UTF-8" "UTF-8" "UTF-8" "unknown"
# [17] "UTF-8" "UTF-8" "UTF-8" "unknown"
您可以使用
解决此问题span <- iconv(span,from="UTF-8",to="")
提取span
字符串后立即。
其次,您的第二个条件:cikti$span == "A K T İ F T O P L A M I"
中不存在cikti
。单词之间有3个空格,例如"A K T İ F T O P L A M I"
。
第三,data.tables不是数据框架。这是非常糟糕的做法,例如,
cikti <- cikti[cikti$span == "AKTİF TOPLAMI" ...]
改为使用:
cikti <- cikti[span == "AKTİF TOPLAMI" ...]
滚动所有,这段代码工作(在我的系统上......)。
a <- postForm("http://www.bigpara.com/analiz/mali-tablolar/",
Yil = "2013", Donem = "4", Kur = "TL", Cins = "1", Submit = "Getir",
HisseKod = sirket)
a <- htmlParse(a)
span <- xpathSApply(a, "//div[@class='maliTable']//li//span", xmlValue)
span <- iconv(span,from="UTF-8",to="")
small <- xpathSApply(a, "//div[@class='maliTable']//li//small", xmlValue)
small <- gsub("[.]","",small)
small <- as.numeric(small)
cikti <- data.table(span, small)
cikti <- cikti[span == "AKTİF TOPLAMI" | span == "A K T İ F T O P L A M I"]
cikti <- cikti[order(-small)]
cikti <- cikti[1,]$small
答案 1 :(得分:0)
如果你不想搞乱编码,httr和rvest会自动为你处理:
res <- POST("http://www.bigpara.com/analiz/mali-tablolar/",
body = list(
Yil = "2013",
Donem = "4",
Kur = "TL",
Cins = "1",
HisseKod = "FENER"
),
encode = "form"
)
mali_table <- html(res) %>% html_nodes("div.maliTable li")
span <- mali_table %>% html_nodes("span") %>% html_text()
small <- mali_table %>%
html_nodes("small") %>%
html_text() %>%
gsub("\\.", "", .) %>%
as.numeric()
selected <- span %in% c("AKTİF TOPLAMI", "A K T İ F T O P L A M I")
data.frame(
span = span[selected],
small = small[selected]
)