W3验证器Api用于文件上传/直接输入

时间:2019-03-28 10:11:00

标签: php api curl post http-post

我正在尝试使用.printf API通过POST方法直接输入。

https://github.com/validator/validator/wiki/Service-%C2%BB-Input-%C2%BB-textarea

这是我尝试过但没有成功的

http://validator.w3.org/nu/

1 个答案:

答案 0 :(得分:1)

首先,“直接输入” API仅接受multipart/form-data格式的POST请求,但是当您通过http_build_query()运行它时,会将其转换为application/x-www-form-urlencoded格式,该api无法理解。 (给CURLOPT_POSTFIELDS一个数组,它会自动转换为multipart/form-data

第二秒,该API阻止了缺少User-Agent头的请求,并且libcurl没有默认的UA(cli程序没有默认的UA,但是libcurl没有),因此您必须自己提供一个,但不要提供

...修复了这2个问题,并添加了一些简单的错误消息解析,

<?php
$ch=curl_init();
$html=<<<'HTML'
<!DOCTYPE html>
<html lang="">
<head>
<title>Test</title>
</head><ERR&OR
<body>
<p></p>
</body>
</html>
HTML;
curl_setopt_array($ch,array(
    CURLOPT_URL=>'http://validator.w3.org/nu/',
    CURLOPT_ENCODING=>'',
    CURLOPT_USERAGENT=>'PHP/'.PHP_VERSION.' libcurl/'.(curl_version()['version']),
    CURLOPT_POST=>1,
    CURLOPT_POSTFIELDS=>array(
        'showsource'=>'yes',
        'content'=>$html
    ),
    CURLOPT_RETURNTRANSFER=>1,
));
$html=curl_exec($ch);
curl_close($ch);
$parsed=array();
$domd=@DOMDocument::loadHTML($html);
$xp=new DOMXPath($domd);
$res=$domd->getElementById("results");
foreach($xp->query("//*[@class='error']",$res) as $message){
    $parsed['errors'][]=trim($message->textContent);
}
var_dump($html);
var_dump($parsed);

打印:

array(1) {
  ["errors"]=>
  array(4) {
    [0]=>
    string(156) "Error: Saw < when expecting an attribute name. Probable cause: Missing > immediately before.At line 6, column 1</head><ERR&ORâ©<body>â©<p></p>â©"
    [1]=>
    string(254) "Error: Element err&or not allowed as child of element body in this context. (Suppressing further errors from this subtree.)From line 5, column 8; to line 6, column 6e>â©</head><ERR&ORâ©<body>â©<p></Content model for element body:Flow content."
    [2]=>
    string(144) "Error: End tag for  body seen, but there were unclosed elements.From line 8, column 1; to line 8, column 7>â©<p></p>â©</body>â©</htm"
    [3]=>
    string(118) "Error: Unclosed element err&or.From line 5, column 8; to line 6, column 6e>â©</head><ERR&ORâ©<body>â©<p></"
  }
}

...,而Unicode问题则源于DOMDocument的默认字符集。idk,not-utf8,afaik没有设置DOMDocument的默认字符集的好方法,但是您可以这样做

$domd=@DOMDocument::loadHTML('<?xml encoding="UTF-8">'.$html);

使其打印:

array(1) {
  ["errors"]=>
  array(4) {
    [0]=>
    string(147) "Error: Saw < when expecting an attribute name. Probable cause: Missing > immediately before.At line 6, column 1</head><ERR&OR↩<body>↩<p></p>↩"
    [1]=>
    string(245) "Error: Element err&or not allowed as child of element body in this context. (Suppressing further errors from this subtree.)From line 5, column 8; to line 6, column 6e>↩</head><ERR&OR↩<body>↩<p></Content model for element body:Flow content."
    [2]=>
    string(135) "Error: End tag for  body seen, but there were unclosed elements.From line 8, column 1; to line 8, column 7>↩<p></p>↩</body>↩</htm"
    [3]=>
    string(109) "Error: Unclosed element err&or.From line 5, column 8; to line 6, column 6e>↩</head><ERR&OR↩<body>↩<p></"
  }
}

...更好,但仍然包含网页上使用的箭头,可以用

将其删除
foreach($xp->query("//*[@class='lf']") as $remove){
    $remove->parentNode->removeChild($remove);
}

使其打印:

array(1) {
  ["errors"]=>
  array(4) {
    [0]=>
    string(138) "Error: Saw < when expecting an attribute name. Probable cause: Missing > immediately before.At line 6, column 1</head><ERR&OR<body><p></p>"
    [1]=>
    string(236) "Error: Element err&or not allowed as child of element body in this context. (Suppressing further errors from this subtree.)From line 5, column 8; to line 6, column 6e></head><ERR&OR<body><p></Content model for element body:Flow content."
    [2]=>
    string(126) "Error: End tag for  body seen, but there were unclosed elements.From line 8, column 1; to line 8, column 7><p></p></body></htm"
    [3]=>
    string(100) "Error: Unclosed element err&or.From line 5, column 8; to line 6, column 6e></head><ERR&OR<body><p></"
  }
}