我已将此问题移至 code review
我在bash中编写了一个递归下降解析器。
我想知道你是否可以指出对我有用的东西。它支持带有解析错误报告的反斜杠转义和带引号的字段。
该脚本在某些方面与cut
类似。从文件或stdin
获取输入,允许用户选择要打印的行和哪个字段。分别使用选项-l
和-f
。可以打印字段列表,并且可以使用选项--list'1 2 3 4 5'和--list-seperator $'\ n'来指定该列表的自定义分隔符。
#!/bin/bash
shopt -s extglob; # we need maxiumum expression capabilities enabled
# option variables
declare list='' delim=$'\n' field='' lineNum=1;
while [[ ${1:0:1} = '-' ]]; do # Parse user arguments
case $1 in
-f)
field=$2; shift 2;
;;
-l)
lineNum=$2; shift 2;
;;
--list-seperator)
delim="$2"; shift 2;
;;
--list)
list="$2"; shift 2;
;;
*) break;;
esac
done
# open a user supplied file on stdin
[[ -e "$1" ]] && exec 0<$1;
# data from 'read/getline'
declare input='';
# we are using sed to optimize input, the command just prints the desired line
read -r input < <(sed -n ${lineNum}p)
# why doesn't the above work as a pipe into read?
# range of this line
declare strLen=${#input} value='';
# data processing variables
declare symbol='' value='';
# REGEX symbol "classes"
declare nothing='' comma='[,]' quote='["]' backslash='[\]' text='[^,\"]';
# integers:
declare -i iPos=-1 tPos=0;
# output array:
declare -a items=();
NextSymbol() {
symbol="${input:$((++iPos)):1}"; # get next char from string
(( iPos < strLen )); # return false if we are out of range
}
Accept() {
[[ -z "$symbol" && -z "$1" ]] && return 0; # accept "nothing/empty"
# if you can meld the above line into the next line
# let me know: pc.wiz.tt@gmail.com; this is some kind of bug!
# becare careful because expect expects 'nothing' to be empty.
# that's why it says 'end of input'
[[ "$symbol" =~ ^$1$ ]] && NextSymbol && return 0; # match symbol
}
Expect() {
Accept "$1" && return
local msg="$0: parse failure: line $lineNum: expected symbol: "
echo "$msg'${1:-end of input}'" >&2;
echo "$0: found: '$symbol'" >&2;
exit 1;
}
value() {
while Accept $text; do # symbol will not be what we expect here
value+=${input:$((iPos-1)):1}; # so get what we expect
done
Accept $nothing && { # this will only happen at end of the string
value+=${input:$((iPos-1)):1} # get the last char
pushValue; # push the data into the array
}
}
pushValue() {
items[tPos++]=$value;
value=''; # clear value!
}
quote() {
until [[ $symbol =~ $quote || -z $symbol ]]; do
value+=$symbol;
NextSymbol;
done
Expect $quote;
}
line() {
Accept $quote && {
quote
line;
}
Accept $backslash && {
value+=$symbol;
NextSymbol;
value;
line;
}
Accept $comma && {
pushValue;
line;
}
Accept $text && {
value=${input:$((iPos-1)):1};
value;
line;
}
}
NextSymbol;
line;
Expect $nothing
[[ $field ]] && { # want field
echo "${items[field-1]}" # print the item
# (index justified for human use)
exit;
}
[[ $list ]] && { # want list
for index in $list; do
echo -n "${items[index-1]}${delim}" # print the list
# (index justified for human use)
done
exit;
}
exit 1; # no field or list
答案 0 :(得分:2)
说真的,让它更快的方法是用一种没有解释的语言写入。
对于解析器来说shell可能是一个糟糕的选择,原因与我不会用6502汇编语言或COBOL中的操作系统编写会计软件包的原因相同。
老实说,awk
不会好得多。它是顺序文本处理的理想选择,但将它应用于像解析器一样复杂的东西是一种延伸。
有时,您只需重新考虑您正在使用的工具。如果你不能这样做是一种像C这样的编译语言,至少要考虑像Python这样面向字节码的语言。