Question

我们正在为CMS添加一些功能，当用户创建页面时，他们可以选择允许/禁止搜索引擎索引该页面的选项。

如果他们选择“是”，那么将适用以下内容：

<cfif request.variables.indexable eq 0>
<cffile 
    action = "append"
    file = "C:\websites\robots.txt"
    output = "Disallow: /blocked-page.cfm"
    addNewLine = "yes">
<cfelse>
<!-- check if page already disallowed in robots.txt and remove line if it does --->
</cfif>

这是我需要帮助的<cfelse>条款。什么是解析robots.txt以查看此页面是否已被禁止的最佳方法？它会是一个cffile action =“read”，然后对read变量做一个find（）吗？

实际上，检查页面是否已被禁止可能会更进一步，以避免重复添加。

Answer 1

您将页面列表保留在数据库中，每个页面记录都有indexable位，对吧？如果是，更简单和更可靠的方法是每次添加/删除某些页面/更改可转位位时生成新的robots.txt。

<!--- TODO: query for indexable pages ---->

<!--- lock the code to prevent concurrent changes --->

<cflock name="robots.txt" type="exclusive" timeout="30">

    <!--- flush the file, or simply start with writing something --->

    <cffile 
        action = "write"
        file = "C:\websites\robots.txt"
        output = "Sitemap: http://www.mywebsite.tld/sitemap.xml"
        addNewLine = "yes">

    <!--- append indexable entry to the file --->

    <cfloop query="getPages">

        <!--- we assume that page names are not entered by user (= safe names) --->

        <cffile 
            action = "append"
            file = "C:\websites\robots.txt"
            output = "Disallow: /#getPages.name#.cfm"
            addNewLine = "yes">

    </cfloop>

</cflock>

示例代码未经过测试，请注意拼写错误。

Answer 2

为此目的使用Robots.txt文件是个坏主意。 Robots.txt is not a security measure并且您正在向“恶人”提交一份您不希望编入索引的网页列表。

使用robots meta tag会更好，它不会为任何人提供您不想编入索引的页面列表，并且可以让您更好地控制机器人可以执行的各个操作。 / p>

使用元标记，您可以像往常一样在生成页面时输出标记。

Answer 3

<!--- dummy page to block --->
<cfset request.pageToBlock = "/blocked-page.cfm" />

<!--- read in current robots.txt --->
<cffile action="read" file="#expandPath('robots.txt')#" variable="data" />
<!--- build a struct of all blocked pages --->
<cfset pages = {} />
<cfloop list="#data#" delimiters="#chr(10)#" index="i">
    <cfset pages[listLast(i,' ')] = '' />
</cfloop>


<cfif request.variables.indexable eq 0>
    <!--- If the page is not yet blocked add it --->
    <cfif not structKeyExists(pages,pageToBlock)>
        <cffile action="append" file="C:\websites\robots.txt" 
             output="Disallow: #request.pageToBLock#" addNewLine="yes" />
        <!--- not sure if this is in a loop but if it is add it to the struct for nex iteration --->
        <cfset pages[request.pageToBlock] = '' />
    </cfif>
</cfif>

这应该这样做。读入文件，循环遍历并构建bloocked页面的结构。只有在尚未阻止的情况下才添加新页面。

ColdFusion搜索robots.txt以查找特定页面异常

3 个答案: