Question

我有两个文件。一个是SALESORDERLIST，就像这样

ProductID;ProductDesc
1,potatoes 1 kg.
2,tomatoes 2 k
3,bottles of whiskey 2 un.
4,bottles of beer 40 gal

（ProductID; ProductDesc）标头实际上不在文件中，因此请忽略它。在另一个文件中，我已经猜到了 - 可能的单位和它们的等价物：

u;u.;un;un.;unit
k;k.;kg;kg.,kilograms

这是我使用正则表达式的第一天，我想知道如何获得SALESORDERLIST中的条目，其单位出现在POSSIBLEUNITS中。在我的示例中，我想排除条目4，因为'gal'未在POSSIBLEUNITS文件中列出。

我说正则表达式，因为我还有一个需要匹配的标准：

egrep "^[0-9]+;{1}[^; ][a-zA-Z ]+" SALESORDERLIST

从那些结果条目中，我想得到以有效单位结尾的那些。

谢谢！

Answer 1

实现目标的一种方法是：

cat SALESORDERLIST | egrep "\b(u|u\.|un|un\.|unit|k|k\.|kg|kg\.|kilograms)\b"


1,potatoes 1 kg.
2,tomatoes 2 k
3,bottles of whiskey 2 un.

元字符 \ b 是一个锚点，可让您执行“仅限整个单词”搜索 \ bword \ b 形式的正则表达式。

http://www.regular-expressions.info/wordboundaries.html

Answer 2

一种方法是创建一个bash脚本，比如称为findunit.sh：

while read line
do
    match=$(egrep -E "^[0-9]+,{1}[^, ][a-zA-Z ]+" <<< $line)    
    name=${match##* }
        # echo "$name..."
        found=$(egrep "$name" /pathtofile/units.txt) 
        # echo "xxx$found"
        [ -n "$found" ] && echo $line
done < $1

然后运行：

findunit.sh SALESORDERLIST

我的输出是：

1,potatoes 1 kg.
2,tomatoes 2 k
3,bottles of whiskey 2 un.

Answer 3

在bash中完全执行此操作的示例：

declare -A units

while read line; do
  while [ -n "$line" ]; do
    i=`expr index $line ";"`
    if [[ $i == 0 ]]; then
      units[$line]=1
      break
    fi
    units[${line:0:$((i-1))}]=1
    line=${line#*;}
  done
done < POSSIBLEUNITS

while read line; do
  unit=${line##* }
  if [[ ${units[$unit]} == 1 ]]; then
    echo $line
  fi
done < SALESORDERLIST

在正则表达式中引用文件

3 个答案: