Question

为了处理来自旧CMS的旧URL，要将旧URL指向相应页面的硬HTML备份，我需要设置一个将/fixedname_[id]重定向/重写为/fixedname_[id].html的规则或者在遗留URL（没有.html）下加载实际的.html文件。

例如，将/druckfrisch_2016_15重定向到/druckfrisch_2016_15.html

OR

在网址/druckfrisch_2016_15.html

下加载/druckfrisch_2016_15

如果我使用RedirectMatch 301 /druckfrisch_(.*) /druckfrisch_$1\.html 它转到

/druckfrisch_2016_15.html.html.html.html.html.html.html.html.html.html.html.html.html.html.html.html.html.html.html.html.html

某种递归错误？目标地址也会一次又一次地被重写？我是否需要在查询的第一部分中排除包含.html的任何网址？

我也试过这些没有运气：

RedirectMatch "/druckfrisch_(.*)" "/druckfrisch_$1\.html"

Redirect "^/druckfrisch_(.*)$" "/druckfrisch_$1\.html"

RewriteRule ^/druckfrisch_(.*)$ /druckfrisch_$1\.html

无论我添加什么新规则，都需要使用默认的Wordpress .htaccess配置：

RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]

Answer 1

我发现，如果我首先使用html RewriteCond RewriteRule排除.个文件，我就不需要将\.视为RewriteEngine On RewriteBase / RewriteRule ^index\.php$ - [L] RewriteCond %{REQUEST_URI} !\.(html)$ RewriteRule ^druckfrisch_(.+)$ /druckfrisch_$1.html [L,R] RewriteCond %{REQUEST_FILENAME} !-f RewriteCond %{REQUEST_FILENAME} !-d RewriteRule . /index.php [L]在规则的第二部分。

#!/usr/bin/python
# coding=utf-8
from scrapinghub import ScrapinghubClient
import unicodecsv as csv
import os
import logging
import pandas as pd
import datetime
import pickle

# Create and configure logger
LOG_FORMAT = "%(levelname)s %(asctime)s - %(message)s"
logging.basicConfig(level = logging.INFO,
                    format = LOG_FORMAT,
                    filemode = 'w')
logger = logging.getLogger()

logger.info("Starting downloading")

# Enter ScrapingHub
apikey = '........'  # your API key as a string
client = ScrapinghubClient(apikey)
projectID = .......
project = client.get_project(projectID)
#   # Give me a list of dictionaries with info (each for every spider i have)
spider_dicts_list = project.spiders.list()
for spider_dict in spider_dicts_list:
    #   # Extract from the list the id of my spider
    spiderID = spider_dict["id"]
    logger.info("Working with spider: " + spiderID)
    # Get that spider and assign it to the object "spider"
    spider = project.spiders.get(spiderID)
    # Get a generator object for the jobs of that spider
    jobs_summary = spider.jobs.iter()
    # Generate all job keys using the generator object
    job_keys = [j['key'] for j in jobs_summary]
    for job_key in job_keys:
        # Get the corresponding job from the key, as "job"
        job = project.jobs.get(job_key)
        # Check to see if the job was completed
        if job.metadata.get(u'close_reason') == u'finished':
            # Create an empty list that will store all items (dictionaries)
            itemsDataFrame = pd.DataFrame()
            for item_aggelia in job.items.iter():
                # Save all items (dictionaries) to the DataFrame
                itemsDataFrame = itemsDataFrame.append(item_aggelia, ignore_index=True)
                job_key_name = job_key.split("/")[2]
                # Export a pickle
                # Check that the list is not empty
            if not itemsDataFrame.empty:
                for meta in job.metadata.iter():
                    if meta[0] == u"scrapystats":
                        timestamp = meta[1][u'finish_time']/1000.0
                dt = datetime.datetime.fromtimestamp(timestamp)
                filename = spiderID+" "+str(dt.year)+"-"+str(dt.month)+"-"+str(dt.day)+" "+str(dt.hour)+"_"+str(dt.minute)+"_"+str(dt.second)+" "+'Items.pickle'
                directory = u"E:/Documents/OneDrive/4_Προγραμματισμός/Scrapy/Αγορά Ακινήτων/"+spiderID+u"/Αρχεία_pd.DataFrame"
                os.chdir(directory)
                with open(filename, 'w') as file:
                    pickle.dump(itemsDataFrame,file)
            # Check for empty fields
            colList = itemsDataFrame.columns.tolist()
            for col in colList:
                if itemsDataFrame[col].isnull().all():
                    logger.warning("Found Null Field, in job " + job_key_name +": " + col)
            # Delete the job from ScrapingHub
            logger.debug("Deleting job " + job_key_name)
            job.delete()
        else:
            logger.info("Found a job that didn't finish properly. Job key: " + job_key+". close_reason:" + job.metadata.get(u'close_reason'))

Apache .htaccess从/ something 重定向到/something.html

1 个答案:

Apache .htaccess从/ something *重定向到/something*.html

1 个答案:

Apache .htaccess从/ something 重定向到/something.html