如何用sed交叉链接(“wikify”)不同文件中的行?

时间:2014-06-22 20:06:37

标签: regex bash replace sed

我的文件包含单行备注,其中包含指向其他备注的链接 >filename_without_extension:line_nr的形式:

m01.txt:
Line 1. >m02:2
Line 2. >m02:3
Line 3.

m02.txt:
Line 1.
Line 2. >m01:3
Line 3. >m01:1 >m01:3

我想添加自动类wiki"反向链接"到了每条相连的线 还没有它们。所以期望的输出应该是这样的:

m01.txt:
Line 1. >m02:2 >m02:3
Line 2. >m02:3
Line 3. >m02:3 >m02:2

m02.txt:
Line 1.
Line 2. >m01:3 >m01:1
Line 3. >m01:1 >m01:3 >m01:2

我想出了一些非常糟糕且不适用于 sed 的东西。它应该遍历我的notes目录中的所有文件:

link_regex=$(sed -e '/(\>m[0-9]+\:[0-9]+?)+?/p')
linenr_from_link_regex=$(sed -e '/\>m[0-9]+?\:/d')
fname_from_cur_link=$(sed -e '/\:[0-9]+?\b/d;/\.txt/a')
link_from_f=$(sed -e '/^/\>/g;/\.txt$/d;/\:=/a' < "$f")
new_link_to_cur_f=$(sed -i "${linenr_fom_cur_link}a\ ${link_from_f}" ${fname_from_cur_link})

function create-cross-references () {
    while read line; do
        echo "$link_regex" | \          # look up links 
        echo "$linenr_from_link_regex"      # pipe to get line number from current link 
        echo "$fname_from_cur_link"         # turn current link to new file name
        echo "$link_from_f"                 # turn current file name name to new link
        echo "$new_link_to_cur_f"           # add new link to current fname
    done
}

for f in *.txt; do
    create-cross-references
done

我在哪里错了?此外,什么是一个更合理的解决方案(最好仍然使用 sed )避免单步执行所有行(包括那些没有链接的行) 我的笔记文件夹每一次?谢谢你的帮助!

1 个答案:

答案 0 :(得分:1)

您可以尝试这样的事情:

#!/bin/bash

function getlinks() {
    # $1 must be something like >m01:1
    grep "$1" *.txt | sed -e 's/\(.*\)\.\(.*\):Line \([0-9]\+\)..*/>\1:\3 /' | \
    # all matches in one single line
    tr -d '\n'
}
for fileName in *.txt;do
    echo "$fileName:"
    while read line;do
        #Line 1. whatever ==> 1
        lineNumber=$( echo $line | grep -Po '(?<=(Line )).*(?=\.)' )
        #m01.txt ==> >m01
        fileNameFormatted=$( echo "$fileName" | sed -e 's/\(.*\)\..*/>\1/'  )
        links=$( getlinks "$fileNameFormatted:$lineNumber" )
        echo "$line $links"
    done < $fileName
done

输出:

m01.txt:
Line 1. >m02:2 >m02:3 
Line 2. >m02:3 
Line 3. >m02:2 >m02:3 
m02.txt:
Line 1. 
Line 2. >m01:3 >m01:1 
Line 3. >m01:1 >m01:3 >m01:2

编辑:由于@ martt的评论,

  

[...]你能否从正则表达式中删除第1行前缀?该   行实际上只包含随机文本+链接(如Blablalbla. >m01:1;这是一个误导性的例子)。另外,如何回应对真实文件的更改?

我对原始剧本进行了一些更改。

  1. 文本文件中不存在的行号,因此需要变量。 ($lineNumber

  2. 如果脚本多次运行,交叉链接将会重复,因此有必要避免这种情况。

  3. 结果必须存储在同一个文件中。


  4. #!/bin/bash
    
    
    for fileName in *.txt;do
        #"Line 1" it is not present now. We've to carry the count of lines processed
        let lineNumber=1
        while read line;do 
            # transform m01.txt into >m01
            fileNameFormatted=$( echo "$fileName" | sed -E 's/(.*)\..*/>\1/'  )
            links=$( \
            #search for occurrences of >filename : grep -nr will return something like
            # m02.txt:3:whatever. >m01:1 >m01:3
            # in this example,
            # we take the filename (m02) and the line number (3).
            # adding '>' and ':'. Result: >m02:3
            grep -nr "$fileNameFormatted:$lineNumber" *.txt  | \
            sed -E 's/(.*)\.(.*):([0-9]+):(.*).(.*)/>\1:\3/' | \
            # replace new lines with spaces
            tr '\n' ' ')
            # skipping duplicates :
            links=$( \
            #merge existing line with links found
            echo "$line $links" | \
            #strip all before the dot
            sed -E 's/(.*)\.(.*)/\2/' | \
            # replace spaces with new line
            tr ' ' '\n' | \
            # remove duplicates: >m02:2 >m02:2 >m03:3
            # ==> >m02:2 >m03:3
            sort -u | \
            # replace newlines with spaces.
            tr '\n' ' ')
            # remove all before the last dot: 
            # Line 1. >m02:2 >m03:3 ==> Line 1
            line=$(echo $line | sed 's/\(.*\)\..*/\1/')
            #merge both strings and append them to a temporary file
            echo "$line.$links" >> "$fileName.tmp"
            let lineNumber++
        done < "$fileName"
            #replace the original file
            mv "$fileName.tmp" "$fileName"
    done