Question

我需要将uTorrent风格的ipfilter.dat转换为bluetack风格的ipfilter文件，并编写此shell脚本来实现此目的：

#!/bin/bash

# read ipfilter.dat-formatted file line by line
# (example: 000.000.000.000-008.008.003.255,000,Badnet
# - ***here, input file's lines/fields are always the same length***)
# and convert into a bluetack.co.uk-formatted output
# (example: Badnet:0.0.0.0-8.8.3.255
# - fields moved around, leading zeros removed)

while read record
do
start=`echo ${record:0:15} | awk -F '.' '{for(i=1;i<=NF;i++)$i=$i+0;}1' OFS='.'`
end=`echo ${record:16:15} | awk -F '.' '{for(i=1;i<=NF;i++)$i=$i+0;}1' OFS='.'`
echo ${record:36:7}:${start}-${end}
done < $1

但是，在2000行输入文件中，此脚本平均需要10（！）秒才能完成 - 仅为200行/秒。

我确信使用sed可以实现相同的结果，而sed-version可能会更快。

是否有一位sed-guru为这种固定位置替换提出解决方案？

也可以随意推荐其他语言的解决方案 - 例如，我会喜欢测试Python或C版本。一个更有效的shell / bash版本也会受到欢迎。

Answer 1

你可以试试这个。

sed -r 's/^0*([0-9]+)\.0*([0-9]+)\.0*([0-9]+)\.0*([0-9]+)-0*([0-9]+)\.0*([0-9]+)\.0*([0-9]+)\.0*([0-9]+),...,(.*)$/\9:\1.\2.\3.\4-\5.\6.\7.\8/' inputfile

我没有测试性能，但我猜它可能比200线/秒快。

Answer 2

你会在大文件上使用shell的read循环来牺牲性能。经验证明，awk/sed（以及某些语言，例如Perl / Python / Ruby）等工具在迭代大文件和处理行时比在读取循环时更好。此外，在您的脚本中，在遍历行时，您也会向awk发出一些调用。这是额外的开销。

红宝石（1.9 +）

$ cat file
000.000.000.000-008.008.003.255,000,Badnet
001.010.110.111-002.020.220.222,111,Badnet

$ ruby -F"," -ane 'puts "#{$F[-1].chomp}:" + $F[0].gsub(/(00|0)([0-9]+)([.-])/,"\\2\\3")'   file
Badnet:0.0.0.0-8.8.3.255
Badnet:1.10.110.111-2.20.220.222

Answer 3

我真的想在单个sed命令中使用它，但我无法弄明白。当然，这仍然会超过200行/秒。

sed 's/\.0\{1,2\}/\./g' | sed 's/^0\{1,2\}//'

Answer 4

#!/bin/tclsh

#Regsub TCL script to remove the leading zeros from the ip address.

#Author : Shoeb Masood , Bangalore

puts "Enter the ip address"
set ip [gets stdin]
set list_ip [split $ip .]
foreach index $list_ip {
regsub  {^0|^00} $index {\1} index
lappend list_ip2 $index
}
set list_ip2 [join $list_ip2 "."]
puts $list_ip2

从IP地址中删除前导零：使用sed将ipfilter.dat转换为bluetack.co.uk ipfilter

4 个答案: