如何在WordPress上创建一个Google Bot友好的robot.txt?

时间:2013-05-18 03:27:24

标签: php robots.txt googlebot

我正在使用WordPress。其中一个文件functions.php包含阻止Google抓取的function do_robots() {...。我用以下内容替换了这个函数:

function do_robots() {
    header( 'Content-Type: text/plain; charset=utf-8' );

    do_action( 'do_robotstxt' );

    if ( '0' == get_option( 'blog_public' ) ) {
                     echo  "User-agent: *";
                     echo  "\nDisallow: /wp-admin";
                     echo  "\nDisallow: /wp-includes";
                     echo  "\nDisallow: /wp-content";
                     echo  "\nDisallow: /stylesheets";
                     echo  "\nDisallow: /_db_backups";
                     echo  "\nDisallow: /cgi";
                     echo  "\nDisallow: /store";
                     echo  "\nDisallow: /wp-includes\n";
    } else {
                     echo  "User-agent: *";
                     echo  "\nDisallow: /wp-admin";
                     echo  "\nDisallow: /wp-includes";
                     echo  "\nDisallow: /wp-content";
                     echo  "\nDisallow: /stylesheets";
                     echo  "\nDisallow: /_db_backups";
                     echo  "\nDisallow: /cgi";
                     echo  "\nDisallow: /store";
                     echo  "\nDisallow: /wp-includes\n";
    }
}
  1. 我对Allow不太确定。只要我没有Disallow,默认情况下它是Allow吗?
  2. 为什么Google Bot仍被上述function
  3. 阻止

1 个答案:

答案 0 :(得分:1)

SVN中的original function看起来比上面的示例阻止了更少的路径,所以我建议删除一些额外的目录(例如wp-content)并查看这是否是你要找的内容。您还可以尝试使用WordPress plugin为其引擎生成Google Sitemap以供阅读。

function do_robots() {
    header( 'Content-Type: text/plain; charset=utf-8' );

    do_action( 'do_robotstxt' );

    $output = "User-agent: *\n";
    $public = get_option( 'blog_public' );
    if ( '0' == $public ) {
        $output .= "Disallow: /\n";
    } else {
        $site_url = parse_url( site_url() );
        $path = ( !empty( $site_url['path'] ) ) ? $site_url['path'] : '';
        $output .= "Disallow: $path/wp-admin/\n";
        $output .= "Disallow: $path/wp-includes/\n";
    }

    echo apply_filters('robots_txt', $output, $public);
}

robots.txt文件的规则是除非另有说明,否则一切都是允许的,尽管遵守robots.txt的搜索引擎更像是一个信任系统。