Question

如果这是重复，我很抱歉。

我正在为我的客户端在PHP和Codeigniter中创建一个应用程序，他有一个奇怪的请求。网站上有一个链接，显示IGI网站的钻石报告。 IGI网站在ASP.net中制作，并使用查询字符串来显示报告。我的应用程序在新的弹出窗口中打开该报告。因为，它是另一个服务器并使用查询字符串，url显示在网页源中。

现在，他想伪装网址或者不希望任何人在网页的来源中看到外部IGI报告网址。我该如何实现此功能？我告诉他这是不可能的，因为IGI服务器本身使用查询字符串。

这可能吗？以下是报告的网址：

http://www.igiworldwide.com/search_report.aspx?PrintNo=S3B30818&Wght=0.13

现在他不希望上面的网址显示在源代码中，但希望显示来自IGI网站的http://www.hiswebsite.com/certificate/1234567879内容。

我很困惑。

拉夫

Answer 1

确实很奇怪:)）

如果您正在使用fopen打开页面，可以使用一些DOM检查来检索您想要的表格，然后在您自己的网站上仅显示该表格。

$page = file_get_contents('http://www.somepage.com/');
$dom = new DOMDocument();
$doc = $dom->loadHTML($page);
$tables = $doc->getElementsByTagName('table');
// find out which table you need and do something with it

Answer 2

有几种方法可以做到这一点：

您可以使用curl使用php获取页面，然后您可以在不向igiworldwide公开呼叫的情况下提供结果。

如果您启用了http包装器，则可以使用类似于

的调用打开文件

readfile('http://www.igiworldwide.com/search_report.aspx?PrintNo=S3B30818&Wght=0.13');

是的，stackoverflow上有很多重复的问题

Answer 3

EDIT2 ：我的解决方案演示：http://pwslogboek.nl/screen-scraping-example

在大多数情况下，这将起作用：

$source = file_get_contents(LINK);

这是另一种选择：

$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, LINK_OF_OTHER_WEBSITE); // The link of the site
curl_setopt($curl, CURLOPT_ENCODING, 'gzip');           // Makes the request faster
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);       // Return the source as string
$source = curl_exec($curl);

现在，您可以使用源代码执行任何您喜欢的操作。

从hrefs中清除它：

    $cleanSource= preg_replace('|href=\'(.*)\'|', '', $source);

如果您需要发布某些内容，则需要额外的卷曲选项：

$postFields = array(
    'user'     => 'username',
    'password' => 'password'   );
$postData = http_build_query($postFields);

curl_setopt($curl, CURLOPT_POST, true);
curl_setopt($curl, CURLOPT_POSTFIELDS, $postData);

我个人不喜欢DOMDocuments ...对于简单的脚本，这样做会很好。

这是我用来在我的脚本中检索表格的脚本：（编辑：此功能现在将为您提供整个表格）

$table = getTagWithContents($source, 'table', '<table tabindex="13" class="tableBg"');

// **********************************************************************************
// Gets a whole html tag with its contents.
//  - Source should be a well formatted html string (get it with file_get_contents or cURL)
//  - You CAN provide a custom startTag with in it e.g. an id or something else (<table style='border:0;')
//    This is recommended if it is not the only p/table/h2/etc. tag in the script.
//  - Ignores closing tags if there is an opening tag of the same sort you provided. Got it?
function getTagWithContents($source, $tag, $customStartTag = false)
{

    $startTag = '<'.$tag;
    $endTag   = '</'.$tag.'>';

    $startTagLength = strlen($startTag);
    $endTagLength   = strlen($endTag);

//      ***************************** 
    if ($customStartTag)
        $gotStartTag = strpos($source, $customStartTag);
    else
        $gotStartTag = strpos($source, $startTag);

    // Can't find it?
    if (!$gotStartTag)
        return false;       
    else
    {

//      ***************************** 

        // This is the hard part: finding the correct closing tag position.
        // <table class="schedule">
        //     <table>
        //     </table> <-- Not this one
        // </table> <-- But this one

        $foundIt          = false;
        $locationInScript = $gotStartTag;
        $startPosition    = $gotStartTag;

        // Checks if there is an opening tag before the start tag.
        while ($foundIt == false)
        {
            $gotAnotherStart = strpos($source, $startTag, $locationInScript + $startTagLength);
            $endPosition        = strpos($source, $endTag,   $locationInScript + $endTagLength);

            // If it can find another opening tag before the closing tag, skip that closing tag.
            if ($gotAnotherStart && $gotAnotherStart < $endPosition)
            {               
                $locationInScript = $endPosition;
            }
            else
            {
                $foundIt  = true;
                $endPosition = $endPosition + $endTagLength;
            }
        }

//      ***************************** 

        // cut the piece from its source and return it.
        return substr($source, $startPosition, ($endPosition - $startPosition));

    } 
}

使用php从另一台服务器调用网页到您的网站

3 个答案: