我正试图让Google Sitemap Generator工作。
这是我的(Zend Framework 2)项目结构:
/
/...
/public/...
/public/sitemap.xml
/public/urllist.txt
/...
/temp/googlesitemapgen/
/temp/googlesitemapgen/config.xml
/temp/googlesitemapgen/sitemap_gen.py
/...
config.xml中
<?xml version="1.0" encoding="UTF-8" ?>
<site
base_url="http://foo.bar.loc"
store_into="/var/www/bar/foo/public/sitemap.xml"
verbose="3"
suppress_search_engine_notify="0"
>
<urllist path="/var/www/bar/foo/public/urllist.txt" encoding="UTF-8" />
</site>
urllist.txt中
http://foo.bar.loc
当我调用生成脚本时
user@machine:/var/www/bar/foo/temp/googlesitemapgen# python sitemap_gen.py --config=config.xmlthon sitemap_gen.py --config=config.xml
发生错误:
user@machine:/var/www/bar/foo/temp/googlesitemapgen# python sitemap_gen.py --config=config.xml
sitemap_gen.py:65: DeprecationWarning: the md5 module is deprecated; use hashlib instead
import md5
Reading configuration file: config.xml
BaseURL is set to: http://foo.bar.loc/
Input: From URLLIST "/var/www/bar/foo/public/urllist.txt"
Opened URLLIST file: /var/www/bar/foo/public/urllist.txt
[WARNING] Discarded URL for not starting with the base_url: http://foo.bar.loc
[WARNING] No URLs were recorded, writing an empty sitemap.
Sorting and normalizing collected URLs.
Writing Sitemap file "/var/www/bar/foo/public/sitemap.xml" with 0 URLs
Notifying search engines.
[ERROR] When attempting to access our generated Sitemap at the following URL:
http://foo.bar.loc/sitemap.xml
we failed to read it. Please verify the store_into path you specified in
your configuration file is web-accessable. Consult the FAQ for more
information.
[WARNING] Proceeding to notify with an unverifyable URL.
Notifying: www.google.com
Notification URL: http://www.google.com/webmasters/sitemaps/ping?sitemap=http%3A%2F%2Ffoo.bar.loc%2Fsitemap.xml
Number of errors: 1
Number of warnings: 3
此错误在文档的“Troubleshooting”部分中进行了描述。但我已经检查了base_url
和store_into
- 两者都设置正确。
为什么现在出现此错误?我做错了吗?什么?如何让工具正常工作?
THX
答案 0 :(得分:0)
你需要一个urllist.txt,里面有实际的网址。网站生成器不会为您抓取/抓取您的网站。它可以检查apache日志或引用其他生成的站点地图,但它本身不会抓取。
请参阅我的回答:
我有一个命令字符串,通过抓取它来生成给定网站的网址列表。