我正在使用i2ocr.com的OCR服务将图像转换为文本..
在我的项目中,我需要自动完成这项工作,所以我使用PHP来获取图像的文本。
在OCR网站中,postdata包含在multipart / form-data
的形式中喜欢这样:
-----------------------------32642708628732\r\n
Content-Disposition: form-data; name="i2ocr_options"\r\n
\r\n
url\r\n
-----------------------------32642708628732\r\n
Content-Disposition: form-data; name="i2ocr_uploadedfile"\r\n
\r\n
\r\n
-----------------------------32642708629732\r\n
Content-Disposition: form-data; name="i2ocr_url"\r\n
\r\n
http://www.murraydata.co.uk/wp-content/uploads/2013/02/ocr-font-500x220.jpg\r\n
-----------------------------32642708628732\r\n
Content-Disposition: form-data; name="i2ocr_languages"\r\n
\r\n
gb,eng\r\n
-----------------------------32642708628732--\r\n
在PHP中,我正在使用
$ch = curl_init();
$dt = array();
$dt['i2ocr_options'] = 'url';
$dt['i2ocr_uploadedfile'] = '';
$dt['i2ocr_url'] = 'http://www.murraydata.co.uk/wp-content/uploads/2013/02/ocr-font-500x220.jpg';
$dt['i2ocr_languages'] = 'gb,eng';
curl_setopt($ch, CURLOPT_URL,"http://www.i2ocr.com/process_form");
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.1; rv:23.0) Gecko/20100101 Firefox/23.0");
curl_setopt($ch,CURLOPT_ENCODING,"gzip,deflate");
curl_setopt($ch, CURLOPT_HTTPHEADER, Array("Content-Type: multipart/form-data; boundary=---------------------------32642708628732"));
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_REFERER, "http://www.i2ocr.com/");
curl_setopt($ch, CURLOPT_AUTOREFERER, 1);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, "$dt");
$html=curl_exec($ch);
print_r($html);
此代码不会产生任何错误,但我也没有得到任何输出。
我需要帮助从此卷曲请求中获取输出。
答案 0 :(得分:0)
像这样:
<?php
function get($url, $refer, $ch)
{
curl_setopt ($ch, CURLOPT_URL,$url);
curl_setopt ($ch, CURLOPT_POST, 0);
curl_setopt ($ch, CURLOPT_COOKIEJAR, realpath('cookie.txt')); // cookie.txt
curl_setopt ($ch, CURLOPT_COOKIEFILE, realpath('cookie.txt'));
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (X11; U; Linux i586; de; rv:5.0) Gecko/20100101 Firefox/5.0');
curl_setopt ($ch, CURLOPT_REFERER, $refer);
$result= curl_exec($ch);
return $result;
}
function post($url, $refer, $parametros, $ch)
{
curl_setopt ($ch, CURLOPT_URL,$url);
curl_setopt ($ch, CURLOPT_POST, 1);
curl_setopt ($ch, CURLOPT_POSTFIELDS, $parametros);
curl_setopt ($ch, CURLOPT_COOKIEJAR, realpath('cookie.txt')); // cookie.txt
curl_setopt ($ch, CURLOPT_COOKIEFILE, realpath('cookie.txt'));
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (X11; U; Linux i586; de; rv:5.0) Gecko/20100101 Firefox/5.0');
curl_setopt ($ch, CURLOPT_REFERER, $refer);
$result= curl_exec($ch);
return $result;
}
function hazlo() {
$ch = curl_init();
/* STEP 1. visito la primera pagina para coger sus cookies */
get ("http://www.i2ocr.com/", "http://www.i2ocr.com/", $ch);
//STEP 2. Creo un array con los datos del post
$data = array(
'i2ocr_options' => 'url',
'i2ocr_uploadedfile' => '',
'i2ocr_url' => 'http://www.murraydata.co.uk/wp-content/uploads/2013/02/ocr-font- 500x220.jpg',
'i2ocr_languages' => 'gb,eng'
);
$data2 = http_build_query($data);
//STEP 3. Enviamos el el array en post
echo post ("http://www.i2ocr.com/process_form", "http://www.i2ocr.com/", $data2, $ch);
}
hazlo();
?>
使用view source查看响应html,你可以看到图片的文字(对不起我的英文)。 100%工作:)