Question

我有多个类似于以下格式的字符串：

您好安德烈（91234342），目前我们的记录显示2016-10-24您被发现...

我需要提取括号中的数字，该数字始终为8位数字，而日期始终为YYYY-MM-DD格式。但它们并不总是以字符串中的相同顺序出现。

输出需要如下所示： 2016-10-24 91234342

我尝试使用sed来获取我想要的值，但只能使用sed来获取一个值。

有人可以提供一些帮助/建议吗？

谢谢！

Answer 1

只需使用2个表达式来满足这两种格式：

$ cat file 
Hi there Andre (91234342), currently our records show that on 2016-10-24 you were found ...
Hi there Andre 2016-10-24, currently our records show that on (91234342) you were found ...
$ sed -r -e 's/^.*\(([0-9]{8})\).*([0-9]{4}-[0-9]{2}-[0-9]{2}).*$/\2 \1/' -e 's/^.*([0-9]{4}-[0-9]{2}-[0-9]{2}).*\(([0-9]{8})\).*$/\1 \2/' file
2016-10-24 91234342
2016-10-24 91234342
$

这是8位数后日期行的第1个表达式：
-e 's/^.*$([0-9]{8})$.*([0-9]{4}-[0-9]{2}-[0-9]{2}).*$/\2 \1/'

，此表达式的顺序相反：
-e 's/^.*([0-9]{4}-[0-9]{2}-[0-9]{2}).*$([0-9]{8})$.*$/\1 \2/'

Answer 2

试试这个：

sed -r 's/.*\(([0-9]{8})\).*([0-9]{4}-[0-9]{2}-[0-9]{2}).*/\2 \1/;s/.*([0-9]{4}-[0-9]{2}-[0-9]{2}).*\(([0-9]{8})\).*/\1 \2/' infile

Answer 3

以下正则表达式应该可以正常工作

.+\((\d{8})\).+(\d{4}\-\d{2}\-\d{2}).+

Answer 4

您有几种选择。其他答案已经显示了扩展正则表达式语法的使用，但您也可以使用正则表达式稍微调整语法，并且您可以创建一个简短的脚本来消除重复键入。< / p>

例如基本语法是：

$ sed -e "s/^.*[(]\([0-9]\{8\}\)[)].*\([0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\}\).*$/\2 \1/;
s/^.*\([0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\}\).*[(]\([0-9]\{8\}\)[)].*$/\1 \2/" file.txt

示例file.txt

$ cat file.txt
Hi there Andre (91234342), currently our records show that on 2016-10-24 you were found ...
Hi there Andre 2016-10-24, currently our records show that on (91234342) you were found ...

使用上面的正则表达式给出：

2016-10-24 91234342
2016-10-24 91234342

在脚本中使用变量

您可以使用变量来保持正则表达式和替换命令清晰可读。例如：

#!/bin/bash

digits='[(]\([0-9]\{8\}\)[)]'
pdate='\([0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\}\)'

sed -e "s/^.*${digits}.*${pdate}.*$/\2 \1/;
s/^.*${pdate}.*${digits}.*$/\1 \2/" \
"$1"

<强>输出

$ bash sedcmd.sh file.txt
2016-10-24 91234342
2016-10-24 91234342

无论哪种方式，使用常规语法或扩展语法，只需找到保存表达式的方法，这样就不必冒险重新输入:)

使用bash脚本中的regex从一行中提取两个值

4 个答案: