Question

我确实将我的网站（GWT）设置为可由Google抓取。在Google网站上使用“抓取谷歌”页面时，我看到以下模式：

正确重定向访问“http://www.mysite.com/#!AJAX_URL” 快照
但Google未请求“http://www.mysite.com”的快照虽然我确实设置了的web.xml

==＆GT;与此相关的两个问题：

是因为Google网站管理员工具不够聪明，但真正的机器人会正确请求快照
我应该在web.xml或其他任何地方添加内容吗？

谢谢，

格·

Answer 1

经过大量搜索后，我找到了答案。它只是Fetch as Googlebot功能，不检查元标记，而只是返回原始内容。当Google抓取并索引页面时，他们会注意到元标记并采取相应的行动。

答案的链接在这里（参见JohnMu的评论）：

Answer 2

确保您的'robots.txt'允许抓取工具访问：

User-agent: *
Allow: /

另外，您可能需要提交Sitemap to Webmaster Tools。

听起来快照正确提供。为了以防万一，我发布了一个工作'index.php'的相关部分。静态页面位于'static / $ {TOKEN} .html'

中

<!doctype html>
<?php

function static_url ($token) { return 'static/' . $token . '.html'; }

$escaped_fragment = $_GET['_escaped_fragment_'];

if (isset($escaped_fragment)) {
  $fragment = preg_replace('/\//', '', $escaped_fragment);
  $file = static_url($fragment);

  if($escaped_fragment == '' || $escaped_fragment == '/'
      || (! file_exists($file))) {
    $fragment = '${DEFAULT_PLACE}:${DEFAULT_STATE}'; // your default place
    $file = static_url($fragment);
  }
  $re = '/(^<[^>]*>)|(\n|\r\n|\t|\s{2,4})*/';

  $handle = fopen($file, 'r');
  if ($handle != false) {
    $content = preg_replace($re, '', fread($handle, filesize($file)));
    fclose($handle);
  }
  else {
    $content = 'Page not found!';
    header(php_sapi_name() == 'cgi' ? 'Status: 404' : 'HTTP/1.1 404');
  }
  echo $content;
} else { ?>

<html> ... Your GWT host page ... </html>

<? } ?>

Google索引：_escaped_fragment_不适用于主页

2 个答案: