Question

我有一个用于处理文本文件的bash脚本：

#/bin/bash

dos2unix sourcefile.txt

cat sourcefile.txt | grep -v '\/' | grep -v '\-\-' | grep -v '#' | grep '[A-Za-z]\*' > modified_sourcefile.txt

mv modified_sourcefile.txt sourcefile.txt
#
# Read the sourcefile file one line by line and iterate...
#

while read line
do

 echo $line | grep -v '\/' | grep -v '\-\-' | grep -v '#'
 if [ $? -eq 0 ]
 then

   # echo "Current Line is " $line ";"
    char1=`echo ${line:0:1}`
   # echo "1st char is " $char1

  if [ -n "$char1" ]
   # if a blank-line, neglect the line.
    then
        # echo "test passed"
        var1=`echo $line | cut -d '*' -f 1`
    var2=`echo $line | cut -d '*' -f 1`
    var3=`echo $line | cut -d - -f 1`
        var4=`echo $line | cut -d '*' -f 1`
        var5=`echo $line | cut -d '*' -f 2`
        var6=`echo $line | cut -d - -f 1`
        var7=`echo $line | cut -d '*' -f 3 `


        table1sql="INSERT IGNORE INTO table1 (id,name,active_yesno,category,description,
           last_modified_by,last_modified_date_time) SELECT ifnull(MAX(id),0)+1,'$var1',1,
           '$var2','$var3','admin',NOW() FROM table1;"

    echo $table1sql >> result.txt


    privsql="INSERT IGNORE INTO table2 (id,name,description,active_yesno,group_code,
             last_modified_by,last_modified_date_time) SELECT ifnull(MAX(id),0)+1,'$var1',
         '$var3',1,'$var2','admin',NOW() FROM table2;"

    echo $privsql >> result.txt     


    table1privmapsql="INSERT IGNORE INTO table1_table2_map (id,table1_id,table2_id,
                  last_modified_by,last_modified_date_time) SELECT ifnull(MAX(id),0)+1,
                  (select id from table1 where name='$var1'),(select id from table2 where name='$var1'),'admin',NOW() FROM table1_table2_map;"
    echo $table1privmapsql >> result.txt

        privgroupsql="INSERT IGNORE INTO table2_group (id,name,category,active_yesno,last_modified_by,
                      last_modified_date_time) SELECT ifnull(MAX(id),0)+1,'tablegrp','$pgpcode',1,'admin',NOW() FROM table2_group;"

        echo $privgroupsql >> result.txt


    privprivgrpsql="INSERT IGNORE INTO table2_table2group_map (id,table2_id,table2_group_id,
                        last_modified_by,last_modified_date_time) SELECT ifnull(MAX(id),0)+1,
                        (select id from table2 where name='$var1'),(select id from table2_group where name='tablegrp'),'admin',NOW() FROM table2_table2group_map;"
        echo $privprivgrpsql >> result.txt              

    rolesql="INSERT IGNORE INTO role (id,name,active_yesno,security_domain_id,last_modified_by,last_modified_date_time) 
                 SELECT (select ifnull(MAX(id),0)+1 from role),'$rolename',1, sd.id ,'admin',NOW() 
                 FROM security_domain sd WHERE sd.name = 'General';"

        echo $rolesql >> result.txt

    fi                  
 fi                        
done < "sourcefile.txt"

事情是sourcefile.txt有超过11000行。所以完成需要大约25分钟:-(。

有更好的方法吗？

sourcefile.txt的内容：

AAA-something*LOCATION-some_where*ABC

Answer 1

为了使这个脚本更快，你必须尽量减少对外部命令的调用，并尽可能使用bash。

阅读this article，了解什么是无用的命令。
阅读this article以了解如何使用bash来操作字符串。
将重复值（var1，var2，var4）赋值为单值。

优化cut时，您可以替换

var1=`echo $line | cut -d '*' -f 1`

到

var1="${line%%\**}"

并且

var5=`echo $line | cut -d '*' -f 2`

到

var5="${line%\**}"
var5="${var5##*\*}"

也许它不是那么易读，但比剪切速度快得多。

另外

 echo $line | grep -v '\/' | grep -v '\-\-' | grep -v '#'

可以替换成类似的东西：

 if [[ "$line" =~ ([/#]|--) ]]; then :; else 
    # all code inside "if [ $? -eq 0 ]"
 fi

Answer 2

shell脚本本质上很慢，特别是当他们使用像你这样的很多外部命令时。造成这种情况的最大原因是因为产生外部进程的速度相当慢，而且你做了很多次。

如果您在对数据进行高性能处理后，那么您应该编写Perl或Python脚本，这样可以满足您的需要，而不会产生任何外部进程：no dos2unix，no grep，没有cut或类似内容。

Perl（和Python）也完全能够直接与数据库通信并插入数据，也无需使用外部命令。

如果你做得对，我预测使用Perl的处理性能将比现在快至少100倍。

如果你对Perl没问题，你可以从这样的事情开始，并根据自己的喜好调整：

#!/usr/bin/perl -w

use strict;
use warnings;

open FILE, "sourcefile.txt" or die $!;
open RESULT, ">>result.txt" or die $!;
while (my $line = <FILE>) {
    # ignore lines with /, -- or #: 
    next if $line =~ m{/|--|#};
    my ($var1, $var2, $var3, $var4, $var5) =
        ($line =~ /^(\w+)-(\w+)\*(\w+)-(\w+)\*(\w+)/);
    # ignore line if regex did not match:
    next unless $var1 and $var2 and $var3 and $var4 and $var5;
    print RESULT "some sql stmt. using $var1, $var2, etc";
    print RESULT "some other sql using $var1, $var2, etc";
    # ...
}
close RESULT;
close FILE;

Answer 3

在优化之前，简介！了解如何使用time命令。找出你的脚本的哪个部分花费最多的时间，并将你的努力放在那里。

话虽如此，我认为多次传递grep会让事情变得缓慢。

此：

cat sourcefile.txt | grep -v '\/' | grep -v '\-\-' | grep -v '#' | grep '[A-Za-z]\*'

可以替换为：

grep '[A-Za-z]\*' sourcefile.txt | grep -v -e '\/' -e '\-\-' -e '#'

优化shell脚本（bash）以提高性能

3 个答案: