Question

使用shell我想搜索并打印只有下一个字的子字符串到该子字符串。

e.g。 logfile有一行＆＃34;今天是星期一，这是：1234所以我在。＆＃34;

if grep -q "this is :" ./logfile; then
   #here i want to print only sub-string with next word i.e. "this is:1234"
   #echo ???
fi

Answer 1

您可以sed与\1一起使用$..$中显示匹配的字符串：

sed 's/.*\(this is:[0-9a-zA-Z]*\).*/\1/' logfile

编辑：上述命令仅适用于1行输入。

当您有一个包含更多行的文件时，您只想打印匹配的行：

sed -n 's/.*\(this is:[0-9a-zA-Z]*\).*/\1/p' logfile

如果您有一个大文件并且只想查看第一个匹配项，则可以将此命令与head -1结合使用，但是您希望在第一次匹配后停止扫描/解析。您可以使用q退出，但只想在匹配后退出。

sed -n '/.*\(this is:[0-9a-zA-Z]*\).*/{s//\1/p;q}'

Answer 2

如果您只想要下一个单词，则可以使用带有look-behind的正则表达式：

$ grep --perl-regexp -o '(?<=(this is:))(\S+)' ./logfile
1234

如果你想要两者，那么只需：

$ grep --perl-regexp -o 'this is:\S+' ./logfile
this is:1234

-o选项指示grep仅返回匹配的部分。

在上面的命令中，我们假设一个＆＃34;字＆＃34;是一系列非空格字符。您可以根据需要进行调整。

Answer 3

您可以查找所有内容，但不包括下一个空格：

grep -Eo "this is:[^[:space:]]+" logfile

[]引入了您要查找的字符集，并且开头的^补充了该集合，因此您要查找的字符集是一个空格，但是补充，即不是空白。 +表示必须至少包含一个或多个此类字符。

-E告诉grep使用扩展正则表达式，-o表示只打印匹配的部分。

Answer 4

如果您的系统具有GNU扩展（但不确定它是使用可选的PCRE支持编译的），请考虑：

if result=$(grep -E -m 1 -o 'this is:[^[:space:]]+' logfile); then
  echo "value is: ${result#*:}"
fi

${varname#value}扩展为varname的内容，但如果存在，value从开头删除。因此，${result#*:}将所有内容剥离到result中的第一个冒号。

但是，如果没有非POSIX选项-o或-m，系统可能无效。

如果你想支持非GNU系统，awk是一个值得考虑的工具：与需要非便携式扩展的答案（如grep -P）不同，这应该适用于任何现代平台（使用GNU awk测试），最近的BSD awk和mawk;同样，gawk --posix --lint没有警告：

# note that the constant 8 is the length of "this is:"
# GNU awk has cleaner syntax, but trying to be portable here.
if value=$(awk '
  BEGIN { matched=0; }      # by default, this will trigger END to exit as failure
  /this is:/ {
    match($0, /this\ is:([^[:space:]]+)/);
    print substr($0, RSTART+8, RLENGTH-8);
    matched=1;              # tell END block to use zero exit status
    exit(0);                # stop processing remaining file contents, jump to END
  }
  END { if(matched == 0) { exit(1); } }
'); then
  echo "Found value of $value"
else
  echo "Could not find $value in file"
fi

如何从shell中的grep结果中提取单词？

4 个答案: