重定向的URL列表比较

时间:2018-01-08 13:17:51

标签: python mysql bash .htaccess csv

我获得了2个CSV文件,每个文件中包含超过3000个URL。

我的任务是从“旧网站”到“新网站”创建一个.htaccess“重定向”块,而不是通过手动比较它们,我想我可以简单地尝试一下bash / python脚本,或将它们导入MySQL进行比较。

所以,在Bash中我尝试了以下代码:

<Style x:Key="ControlButtons">
    <Setter Property="Button.Height" Value="25" />
    <Setter Property="Button.Width" Value="60" />
    <Setter Property="Button.Background" Value="#FF373E48" />
    <Setter Property="Button.Foreground" Value="White" />
    <Setter Property="Button.Margin" Value="3,15,0,5" />
    <Setter Property="Button.HorizontalAlignment" Value="Left" />
    <Setter Property="Button.VerticalAlignment" Value="Center" />
    <Setter Property="Button.Template">
        <Setter.Value>
            <ControlTemplate TargetType="{x:Type Button}">
                <Border x:Name="border" Background="{TemplateBinding Background}" BorderThickness="0.5" BorderBrush="White">
                    <ContentPresenter HorizontalAlignment="Center" VerticalAlignment="Center" />
                </Border>
                <ControlTemplate.Triggers>
                    <Trigger Property="IsEnabled" Value="False">
                        <Setter Property="Background" Value="#FF3D4959" />
                        <Setter Property="Foreground" Value="#FF7B7F84" />
                    </Trigger>
                    <Trigger Property="IsMouseOver" Value="True">
                        <Setter Property="Background" Value="#FF3A3F4C" />
                        <Setter Property="Foreground" Value="White" />
                    </Trigger>
                    <Trigger Property="IsPressed" Value="True">
                        <!-- change the properties of the Button itself:-->
                        <Setter Property="Background" Value="#FF373E48" />
                        <Setter Property="Foreground" Value="#FF000000" />
                        <!-- change the properties of the Border element in the template:-->
                        <Setter TargetName="border" Property="BorderBrush" Value="Red" />
                    </Trigger>
                </ControlTemplate.Triggers>
            </ControlTemplate>
        </Setter.Value>
    </Setter>
</Style>

然而,它返回一个空的“combined.csv”,所以我想也许是“Python”......但是,我对Python知之甚少,所以我想MySQL ......如果我只导入每个CSV进入一个新表,我可以运行一个比较SQL语句并将结果转储到一个2列的新表...唉,我不知道从哪里开始比较,计算#!/bin/bash awk 'BEGIN{FS=OFS="/"} {gsub(/\/$/, ""); $NF=tolower($NF)} NR==FNR{a[$NF]=$0; next} $NF in a {print a[$NF] " " $0 > "combined.csv"}' oldsite.csv newsite.csv 比较声明,但我想知道的是“最佳”(意思是最准确的比较)方法是什么......如果是Python,怎么样?

CSV样本

新网址

LIKE

旧网址

"new-url"
"/product/dangle-hoop-earrings-for-girls-with-cz-and-heart-dangle-in-14k-gold/"
"/product/dangle-hoop-earrings-for-girls-with-cz-and-butterfly-dangle-in-14k-gold/"
"/product/petite-lever-back-earrings-for-little-girls-in-14k-yellow-gold-with-blue-topaz-high-end-childrens-earrings/"

预期合并

"old-url"
"/product/0903-HUGGIEGK/Dangle-Hoop-Earrings-for-Girls-with-CZ-and-Heart-Dangle-in-14K-Gold/"
"/product/0954-HUGGIEGK/Dangle-Hoop-Earrings-for-Girls-with-CZ-and-Butterfly-Dangle-in-14K-Gold/"
"/product/10049Y4JBT/Petite-Lever-Back-Earrings-for-Little-Girls-in-14K-Yellow-Gold-with-Blue-Topaz---High-End-Childrens-Earrings/"

1 个答案:

答案 0 :(得分:1)

正如我们在评论主题中发现的那样,您需要转换数据,以便在awk/unix中通过删除{-1}}部分MS-DOS行结尾来处理

\r

dos2unix file 行结尾从file转换为\r\n。请注意,您可以使用多个文件名调用\n,并且将处理每个文件,即

dos2unix

这是您修改后的代码,它将为“新”文件中的不匹配记录创建单独的文件。我发现需要的唯一更正是更改最终输出以包含dos2unix old.csv new.csv many_more ... 字符,因此,

print a[$NF] "," $0

<强>输出

#!/bin/bash 
awk 'BEGIN{FS=OFS="/"}
  { gsub(/\/$/, "")
    # print "#dbg: FILENAME="FILENAME "\tNR="NR "\tFNR="FNR
    $NF=tolower($NF)
  }
  NR==FNR{
    a[$NF]=$0; next
  }
  {
    if ($NF in a) {
      print  a[$NF] "," $0  > "combined.csv"
    }
    else {
      print  a[$NF] "," $0  > "unmatched.csv"
    }
  }
  ' oldsite.csv newsite.csv

IHTH