解析页面以获取输入,然后提交包含此输入值的帖子

时间:2019-05-30 17:06:45

标签: php curl

我想解析表单所在的页面,该表单包含令牌输入 需要使用此令牌值并随我的输入一起发送

这是我在添加令牌输入之前使用的curl代码

$username = @$_POST['user'];
$password = @$_POST['password'];
$to = @$_POST['to'];
$text = @$_POST['text'];    
$loginUrl = '';
$sendUrl = '';

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $loginUrl);
curl_setopt($ch, CURLOPT_USERAGENT,'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/32.0.1700.107 Chrome/32.0.1700.107 Safari/537.36');
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, "user=$username&password=$password");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_COOKIESESSION, true);
curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookie-name');  
curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookie.txt');
$answer = curl_exec($ch);
if (curl_error($ch)) {
    echo curl_error($ch)."/n";
}
//sending
curl_setopt($ch, CURLOPT_URL, $sendUrl);
curl_setopt($ch, CURLOPT_POSTFIELDS, "recipients=$to&message_body=$text");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_COOKIESESSION, true);
curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookie-name-send'); 
curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookie-send.txt');
$answer = curl_exec($ch);
if (curl_error($ch)) {
    echo curl_error($ch)."/n";
}
echo $answer;

这是我要解析的页面

<form name="user_action" method="post" action="index.php?page=11&amp;lang=ge">
<input type="hidden" name="csrf_token" value="7e71ea58eaaa55986b0fdc71b2d44c92">    
<input type="text" id="user" name="user" class="round_border medium_box">
<input type="password" id="password" name="password" class="round_border medium_box">
<input type="submit" value="შესვლა" class="btn red_btn round_border medium">
</form>

没有,如果没有此标记<input type="hidden" name="csrf_token" value="7e71ea58eaaa55986b0fdc71b2d44c92">

,我无法发布

我需要首先解析此页面以获取令牌,并同时发送带有此令牌的帖子

2 个答案:

答案 0 :(得分:1)

签出DOMDocument来解析HTML。 https://www.php.net/manual/en/class.domdocument.php

这是我要尝试的:

<?php
$page = file_get_contents("https://wherever-the-form-is.com");
$dom = new DOMDocument();
$dom->loadHTML( $page );

// Get a list of all inputs
$inputs = $dom->getElementsByTagName( 'input' );
$total = $inputs->length;
$token = false;

// Loop through inputs looking for one with the right name
for( $i = 0; $i < $total; $i++ ) {
    if ( $inputs->item($i)->getAttribute('name') == 'csrf_token' ) {
        // When you find the right name, record the value and break out of the loop
        $token = $inputs->item($i)->getAttribute('value');
        break;
    }
}

if ( $token ) {
    // Your code here
}

答案 1 :(得分:0)

@Stevish是正确的,您应该使用DOMDocument,但是我建议使用一种略有不同的方法:而是循环使用表单的所有输入子级,例如

$domd=@DOMDocument::loadHTML($answer);
$xp=new DOMXPath($domd);
$inputs=array();
foreach($xp->query("//form[@name='user_action']//input") as $input){
    $inputs[$input->getAttribute("name")]=$input->getAttribute("value");
}

应该让您拥有

$inputs=array (
  'csrf_token' => '7e71ea58eaaa55986b0fdc71b2d44c92',
  'user' => '',
  'password' => '',
  '' => 'á¨áá¡ááá',
)

..您也在此处编码$ to或$ message,如果$ message包含blabla&to=moreblabla,您会怎么办?它将覆盖您之前的$ to并使$ to变量无关紧要,您需要对该狗屎进行url编码,所以要么

curl_setopt($ch, CURLOPT_POSTFIELDS, "recipients=".urlencode($to)."&message_body=".urlencode($text));

或更妙的是,使用http_build_query

curl_setopt($ch, CURLOPT_POSTFIELDS, http_build_query(
    array(
        "recipients" => $to,
        "message_body" => $text
    )
));

...甚至更好,

$domd=@DOMDocument::loadHTML($answer);
$xp=new DOMXPath($domd);
$inputs=array();
foreach($xp->query("//form[@name='user_action']//input") as $input){
    $inputs[$input->getAttribute("name")]=$input->getAttribute("value");
}
$inputs["recipients"]=$to;
$inputs["message_body"]=$text;
curl_setopt($ch, CURLOPT_POSTFIELDS, http_build_query($inputs));