获取一行中单词的位置和长度

时间:2014-03-21 22:57:36

标签: bash awk

我想逐行划分,然后给出每一行的长度和位置。

for word in $line 
do 
    start=`awk -v a="$Logline" -v b="$word" 'BEGIN{print index(a,b)}'` 
    count=`echo $word|wc -m` 
    echo $word : $start : $count
done

所以我们假设:

  

line ='这是测试'

的测试

我会获得:

  

这:0:4

     

是:5:2

     

a:8:1

     

测试:10:4

     

to:15:2

     

测试:18:4

使用此解决方案,当两个单词相同时出现问题。有人知道如何做到这一点吗?

4 个答案:

答案 0 :(得分:2)

$ cat file
This is a test to test
$
$ cat tst.awk
BEGIN{ OFS=" : " }
{
    start = 0
    while ( match($0,/[^ ]+/) ) {
        start = start + RSTART - 1
        print substr($0,RSTART,RLENGTH), start, RLENGTH
        $0 = substr($0,RSTART+RLENGTH)
        start = start + RLENGTH
    }
}
$
$ awk -f tst.awk file
This : 0 : 4
is : 5 : 2
a : 8 : 1
test : 10 : 4
to : 15 : 2
test : 18 : 4

答案 1 :(得分:1)

pos=0
for word in $line
do
    length=`expr length "$word"`
    echo "$word : $pos : $length"
    pos=`expr $pos + 1`
done

答案 2 :(得分:1)

如果单词之间只有一个空格,则可以执行以下操作:

$>echo "this test is a test" | sed 's/ / \n/g'| awk 'BEGIN{i=0}{print $1, ":", i, length($1);i+=length($0)}'
this : 0 4
test : 5 4
is : 10 2
a : 13 1
test : 15 4

答案 3 :(得分:1)

可能您正在尝试这样做:

$ cat file
Hi my name is jaypal
i am a software software test engineer
scripting in awk awk awk is my hobby

$ awk '{for(i=1;i<=NF;i++)printf "Line=%d Length=%d Word=%s\n",NR,length($i),$i}' file
Line=1 Length=2 Word=Hi
Line=1 Length=2 Word=my
Line=1 Length=4 Word=name
Line=1 Length=2 Word=is
Line=1 Length=6 Word=jaypal
Line=2 Length=1 Word=i
Line=2 Length=2 Word=am
Line=2 Length=1 Word=a
Line=2 Length=8 Word=software
Line=2 Length=8 Word=software
Line=2 Length=4 Word=test
Line=2 Length=8 Word=engineer
Line=3 Length=9 Word=scripting
Line=3 Length=2 Word=in
Line=3 Length=3 Word=awk
Line=3 Length=3 Word=awk
Line=3 Length=3 Word=awk
Line=3 Length=2 Word=is
Line=3 Length=2 Word=my
Line=3 Length=5 Word=hobby