我想使用LWP :: UserAgent在CP1251网站上发布俄文,并获得以下结果:
# $text="Русский текст"; obtained from command line
FIELD_NAME => $text # result: Г?в г'В?г'В?г'В?г?вєг?вёг?в? Г'В'Г?вчг?вєг'В?г'В'
$text=Encode::decode_utf8($text);
FIELD_NAME => $text # result: Р с?с?с?рєрёр? С'Рчрєс?с'
FIELD_NAME => Encode::encode("cp1251", $text) # result: Г?гіг+г+гЄгёгЏ ГІгҐгЄг+гІ
FIELD_NAME => URI::Escape::uri_escape_utf8($text) # result: D0%a0%d1%83%d1%81%d1%81%d0%ba%d0%b8%d0%b9%20%d1%82%d0%b5%d0%ba%d1%81%d1%82
我该怎么做?内容类型必须为x-www-form-urlencoded
。你可以找到类似的here形式,但是你可以使用&#...来逃避任何非拉丁字符。表单,尝试在FIELD_NAME
中转义它会导致10561091108910891 10901077108210891
(每&
,#
和;
从字符串中删除)或1056;усский текст
(字符串开头的标点符号被删除)取决于FIELD_NAME
实际上是什么。
UPDATE :有人知道如何转换以下代码,以便它将使用LWP :: UserAgent :: post函数吗?
my $url=shift;
my $fields=shift;
my $request=HTTP::Request->new(POST => absURL($url));
$request->content_type('application/x-www-form-urlencoded');
$request->content_encoding("UTF-8");
$ua->prepare_request($request);
my $content="";
for my $k (keys %$fields) {
$content.="&" if($content ne "");
my $c=$fields->{$k};
eval {$c=Encode::decode_utf8($c)};
$c=Encode::encode("cp1251", $c, Encode::FB_HTMLCREF);
$content.="$k=".URI::Escape::uri_escape($c);
}
$request->content($content);
my $response=$ua->simple_request($request);
此代码实际上解决了这个问题,但我不想添加第三个请求包装函数(与get
和post
一起)。
答案 0 :(得分:0)
如果你有可用的recode系统命令,那么使用recode系统命令的方法似乎是(远非最好的)。来自http://const.deribin.com/files/SignChanger.pl.txt
my $boardEncoding="cp1251"; # encoding used by the board
$vals{'Post'} = `fortune $forunePath | recode utf8..$boardEncoding`;
$res = $ua->post($formURL,\%vals);
另一种方法似乎是http://mail2lj.nichego.net/lj.txt
my $formdata = $1 ;
my $hr = ljcomment_string2form($formdata) ;
my $req = new HTTP::Request('POST' => $ljcomment_action)
or die "new HTTP::Request(): $!\n" ;
$hr->{usertype} = 'user' ;
$hr->{encoding} = $mh->mime_attr('content-type.charset') ||
"cp1251" ;
$hr->{subject} = decode_mimewords($mh->get('Subject'));
$hr->{body} = $me->bodyhandle->as_string() ;
$req->content_type('application/x-www-form-urlencoded');
$req->content(href2string($hr)) ;
my $ljres = submit_request($req, "comment") ;
if ($ljres->{'success'} eq "OK") {
print STDERR "journal updated successfully\n" ;
} else {
print STDERR "error updating journal: $ljres->{errmsg}\n" ;
send_bounce($ljres->{errmsg}, $me, $mh->mime_attr("content-type.charset")) ;
}
答案 1 :(得分:0)
使用WWW::Mechanize
,它会自动处理编码(包括字符编码和表单编码),如果form
元素的accept-charset
属性设置得当,它会做正确的事情。如果缺少,则表单默认为UTF-8
,因此需要更正。你似乎处于这种情况。顺便说一下,您的示例站点的编码是KOI8-R,而不是Windows-1251。工作示例:
use utf8;
use WWW::Mechanize qw();
my $message = 'Русский текст';
my $mech = WWW::Mechanize->new(
cookie_jar => {},
agent => 'Mozilla/5.0 (X11; U; Linux x86_64; en-US) AppleWebKit/533.9 SUSE/6.0.401.0-2.1 (KHTML, like Gecko)',
);
$mech->get('http://zhurnal.lib.ru/cgi-bin/comment?COMMENT=/z/zyx/index_4-1');
$mech->current_form->accept_charset(scalar $mech->response->content_type_charset);
$mech->submit_form(with_fields => { TEXT => $message });
HTTP转储(仅限基本部分):
POST /cgi-bin/comment HTTP/1.1
Content-Length: 115
Content-Type: application/x-www-form-urlencoded
FILE=%2Fz%2Fzyx%2Findex_4-1&MSGID=&OPERATION=store_new&NAME=&EMAIL=&URL=&TEXT=%F2%D5%D3%D3%CB%C9%CA+%D4%C5%CB%D3%D
答案 2 :(得分:-1)
这些功能解决了这个问题(首先发布application / x-www-form-urlencoded数据,第二个发布multipart / form-data):
#{{{2 postue
sub postue($$;$) {
my $url=shift;
my $fields=shift;
my $referer=shift;
if(defined $referer and $referer eq "" and defined $fields->{"DIR"}) {
$referer=absURL($url."?DIR=".$fields->{"DIR"}); }
else {
$referer=absURL($referer); }
my $request=HTTP::Request->new(POST => absURL($url));
$request->content_type('application/x-www-form-urlencoded');
$request->content_encoding("UTF-8");
$ua->prepare_request($request);
my $content="";
for my $k (keys %$fields) {
$content.="&" if($content ne "");
my $c=$fields->{$k};
if(not ref $c) {
$c=Encode::decode_utf8($c) unless Encode::is_utf8($c);
$c=Encode::encode("cp1251", $c, Encode::FB_HTMLCREF);
$c=URI::Escape::uri_escape($c);
}
elsif(ref $c eq "URI::URL") {
$c=$c->canonical();
$c=URI::Escape::uri_escape($c);
}
$content.="$k=$c";
}
$request->content($content);
$request->referer($referer) if(defined $referer);
my $i=0;
print STDERR "Doing POST request to url $url".
(($::o_verbose>2)?(" with fields:\n".
::YAML::dump($fields)):("\n"))
if($::o_verbose>1);
REQUEST:
my $response=$ua->simple_request($request);
$i++;
my $code=$response->code;
if($i<=$o_maxtries and 500<=$code and $code<600) {
print STDERR "Failed to request $url with code $code... retrying\n"
if($::o_verbose>2);
sleep $o_retryafter;
goto REQUEST;
}
return $response;
}
#{{{2 postfd
sub postfd($$;$) {
my $url=absURL(shift);
my $content=shift;
my $referer=shift;
$referer=absURL($referer) if(defined $referer);
my $i=0;
print STDERR "Doing POST request (form-data) to url $url".
(($::o_verbose>2)?(" with fields:\n".
::YAML::dump($content)):("\n"))
if($::o_verbose>1);
my $newcontent=[];
while(my ($f, $c)=splice @$content, 0, 2) {
if(not ref $c) {
$c=Encode::decode_utf8($c) unless Encode::is_utf8($c);
$c=Encode::encode("cp1251", $c, Encode::FB_HTMLCREF);
}
push @$newcontent, $f, $c;
}
POST:
my $response=$ua->post($url, $newcontent,
Content_type => "form-data",
((defined $referer)?(referer => $referer):()));
$i++;
my $code=$response->code;
if($i<=$o_maxtries and 500<=$code and $code<600) {
print STDERR "Failed to download $url with code $code... retrying\n"
if($::o_verbose>2);
sleep $o_retryafter;
goto POST;
}
return $response;
}