Question

到目前为止，我正试图解决这个问题但没有成功我有一个命令输出，我需要咀嚼，使其适合进一步处理

我的文字是：

1/2 [3] (27/03/2012 19:32:54) word word word word 4/5

我需要的是只提取1/2 [3] 4/5的数字，所以看起来：

1 2 3 4 5

所以，基本上我试图排除所有不是数字的字符，比如“/”，“[”，“]”等。我尝试用FS试用awk，尝试使用regexp，但我的尝试都没有成功。

然后我会添加一些东西第一：1秒：2第三：3 ....等请记住，我正在谈论的文件包含很多，如果行具有相同的结构，但我已经考虑过使用awk对每列进行求和

awk '{sum1+=$1 ; sum2+=$2 ;......etc} END {print "first:"sum1 " second:"sum2.....etc}'

但首先我需要只提取相关的数字， “（）”之间的日期可以完全省略，但它们也是数字，所以仅用数字过滤是不够的，因为它也会匹配它们

希望你能帮助我提前谢谢！

Answer 1

这个：sed -r 's/[(][^)]*[)]/ /g; s/[^0-9]+/ /g'应该有效。它进行两次传递，首先删除带括号的表达式，然后用单个空格替换所有非数字运行。

Answer 2

您可以执行sed -e 's/(.*)//' -e 's/[^0-9]/ /g'之类的操作。它删除圆括号内的所有内容，而不是用空格替换所有非数字字符。要摆脱额外的空间，您可以将其提供给column -t：

$ echo '1/2 [3] (27/03/2012 19:32:54) word word word word 4/5' | sed -e 's/(.*)//' -e 's/[^0-9]/ /g' | column -t
1  2  3  4  5

Answer 3

TXR：

@(collect)
@one/@two [@three] (@date @time) @(skip :greedy) @four/@five
@(filter :tonumber one two three four five)
@(end)
@(bind (first second third fourth fifth)
       @(mapcar (op apply +) (list one two three four five)))
@(output)
first:@first second:@second third:@third fourth:@fourth fifth:@fifth
@(end)

数据：

1/2 [3] (27/03/2012 19:32:54) word word word word 4/5
10/20 [30] (27/03/2012 19:32:54) word word 40/50

运行：

$ txr data.txr data.txt
first:11 second:22 third:33 fourth:44 fifth:55

轻松添加一些错误检查：

@(collect)
@  (cases)
@one/@two [@three] (@date @time) @(skip :greedy) @four/@five
@  (or)
@line
@  (throw error `badly formatted line: @line`)
@  (end)
@  (filter :tonumber one two three four five)
@(end)
@(bind (first second third fourth fifth)
       @(mapcar (op apply +) (list one two three four five)))
@(output)
first:@first second:@second third:@third fourth:@fourth fifth:@fifth
@(end)

$ txr data.txr -
foo bar junk
txr: unhandled exception of type error:
txr: ("badly formatted line: foo bar junk")
Aborted

TXR用于强大的编程。存在强类型，因此您不能仅仅因为数字包含数字而将字符串视为数字。变量必须在使用前绑定，因此拼写错误的变量不会默认为零或空白，而是产生unbound variable <name> in <file>:<line>类型错误。使用大量特定上下文执行文本提取，以防止将一种格式的输入误解为另一种格式。

Answer 4

见下文，如果你想要的话：

kent$  echo "1/2 [3] (27/03/2012 19:32:54) word word word word 4/5"|sed -r 's/\([^)]*\)//g; s/[^0-9]/ /g'
1 2  3                       4 5

如果你想让它看起来更好：

kent$  echo "1/2 [3] (27/03/2012 19:32:54) word word word word 4/5"|sed -r 's/\([^)]*\)//g; s/[^0-9]/ /g;s/ */ /g'
 1 2 3 4 5

Answer 5

这将为您提取数字，但不包括括号中的文字：

digits=$(echo '1/2 [3] (27/03/2012 19:32:54) word word word word 4/5' |\
       sed 's/(.*)//' | grep -o '[0-9][0-9]*')
echo $digits

或纯sed解决方案：

echo '1/2 [3] (27/03/2012 19:32:54) word word word word 4/5' |\
sed -e 's/(.*)//' -e 's/[^0-9]/ /g' -e 's/[ \t][ \t]*/ /g'

<强>输出：

1 2 3 4 5

Answer 6

 awk '{ first+=gensub("^([0-9]+)/.*","\\1","g",$0)
        second+=gensub("^[0-9]+/([0-9]+) .*","\\1","g",$0)
        thirdl+=gensub("^[0-9]+/[0-9]+ \[([0-9]+)\].*","\\1","g",$0)
        fourth+=gensub("^.* ([0-9]+)/[0-9]+ *$","\\1","g",$0)
        fifth+=gensub("^.* [0-9]+/([0-9]+) *$","\\1","g",$0)
      }
      END { print "first: " first " second: " second " third: " third " fourth: " fourth " fifth: " fifth
      }

可能会为你工作。

Answer 7

如果你设置了一个奇特的字段分隔符，一次使用awk就足够了：斜杠，空格，左括号或近括号中的任何一个都会分隔一个字段：

awk -F '[][/ ]' '
  {s1+=$1; s2+=$2; s3+=$4; s4+=$(NF-1); s5+=$NF}
  END {printf("first:%d second:%d third:%d fourth:%d fifth:%d\n", s1, s2, s3, s4, s5)}
'

关于awk，sed等，我很困惑

7 个答案: