Question

如果file：list.txt包含如此丑陋的数据：

aaaa 
#bbbb
cccc, dddd; eeee
 ffff;
    #gggg hhhh
iiii

jjjj,kkkk ;llll;mmmm
nnnn

我们如何解析/拆分该文件，不包括注释行，用逗号脚本分隔所有逗号，分号和所有空格（包括制表符，空格，换行符和carrage-return字符）？

Answer 1

可以使用以下代码完成：

#!/bin/bash
### read file:
file="list.txt"

IFSO=$IFS
IFS=$'\r\n'
while read line; do
    ### skip lines that begin with a "#" or "<whitespace>#"
    match_pattern="^\s*#"
    if [[ "$line" =~ $match_pattern ]];
        then 
        continue
    fi

    ### replace semicolons and commas with a space everywhere...
    temp_line=(${line//[;|,]/ })

    ### splitting the line at whitespaces requires IFS to be set back to default 
    ### and then back before we get to the next line.
    IFS=$IFSO
    split_line_arr=($temp_line)
    IFS=$'\r\n'
    ### push each word in the split_line_arr onto the final array
    for word in ${split_line_arr[*]}; do
            array+=(${word})
    done
done < $file

echo "Array items:"
for item in ${array[*]} ; do
    printf "   %s\n" $item
done

这不是一个问题，而是解决其他人在回答其他相关问题时所提出的问题。这里唯一的一点就是那些其他问题/解决方案并没有真正解决如何在用空格，字符和注释的组合分隔字符串时拆分字符串;这是一个同时解决这三个问题的解决方案......

相关问题：

How to split one string into multiple strings separated by at least one space in bash shell?

How do I split a string on a delimiter in Bash?

附加说明：

为什么其他脚本语言更适合拆分时使用bash？与perl程序相比，bash脚本更有可能拥有从基本的upstart或cron（sh）shell运行时所需的所有库。在这些情况下经常需要一个参数列表，我们应该期待维护这些列表的人的最坏情况......

希望这篇文章将在未来很多时间（包括我）节省bash新手...祝你好运！

Answer 2

使用shell命令：

grep -v "^[ |\t]*#" file|tr ";," "\n"|awk '$1=$1'

Answer 3

sed 's/[# \t,]/REPLACEMENT/g' input.txt

上面的命令用任意字符串替换注释字符（'#'），空格（' '），制表符（'\t'）和逗号（','）（'REPLACEMENT'）
替换换行符，您可以尝试：

sed 's/[# \t,]/replacement/g' input.txt | tr '\n' 'REPLACEMENT'

Answer 4

如果您的系统上有Ruby

File.open("file").each_line do |line|
  next if line[/^\s*#/]
  puts line.split(/\s+|[;,]/).reject{|c|c.empty?}  
end

输出

# ruby test.rb 
aaaa
cccc
dddd
eeee
ffff
iiii
jjjj
kkkk
llll
mmmm
nnnn

如何拆分可能由注释和空格，制表符，换行符，逗号或其他字符组合分隔的字符串或文件

4 个答案: