Question

我有一个清单：

    ### To Read:
    One Hundred Years of Solitude | Gabriel García Márquez
    Moby-Dick | Herman Melville
    Frankenstein | Mary Shelley
    On the Road | Jack Kerouac
    Eyeless in Gaza | Aldous Huxley
    ### Read:
    The Name of the Wind (The Kingkiller Chronicles: Day One) | Patrick Rothfuss | 6-27-2013
    The Wise Man’s Fear (The Kingkiller Chronicles: Day Two) | Patrick Rothfuss | 8-4-2013
    Vampires in the Lemon Grove | Karen Russell | 12-25-2013
    Brave New World | Aldous Huxley | 2-2014

我想使用像python＆＃39; string.split(' | ')这样的东西将各个字段分成不同的字符串，但由于这两个字段的字段数不同，我想我需要对它们进行处理不同。我如何选择中间的线条###阅读：＆＃39;和＆＃39; ###阅读：＆＃39;之后＆＃39; ###阅读：＆＃39;分裂他们？我应该使用awk还是sed？

Answer 1

您尚未指定任何所需的输出。因此，当我解释您的问题时，您想要从文件中读取某些行，将这些行拆分为＆＃39; |＆＃39;并且，类似于python列表，将结果放在bash数组中。指定的行包括### To Read:之后的所有行，但读取### Read:的行除外。下面的脚本执行此操作，然后，为了演示成功，显示数组（使用declare）：

active=
while read line
do
    if [ "$line" = '### To Read:' ]
    then
        active=1
    elif [ "$line" = '### Read:' ]
    then
        active=1
    elif [ "$active" ]
    then
        IFS='|' my_array=($line)
        declare -p my_array
    fi
done <mylist

示例输入的输出是：

declare -a my_array='([0]="One Hundred Years of Solitude " [1]=" Gabriel García Márquez")'
declare -a my_array='([0]="Moby-Dick " [1]=" Herman Melville")'
declare -a my_array='([0]="Frankenstein " [1]=" Mary Shelley")'
declare -a my_array='([0]="On the Road " [1]=" Jack Kerouac")'
declare -a my_array='([0]="Eyeless in Gaza " [1]=" Aldous Huxley")'
declare -a my_array='([0]="The Name of the Wind (The Kingkiller Chronicles: Day One) " [1]=" Patrick Rothfuss " [2]=" 6-27-2013")'
declare -a my_array='([0]="The Wise Man’s Fear (The Kingkiller Chronicles: Day Two) " [1]=" Patrick Rothfuss " [2]=" 8-4-2013")'
declare -a my_array='([0]="Vampires in the Lemon Grove " [1]=" Karen Russell " [2]=" 12-25-2013")'
declare -a my_array='([0]="Brave New World " [1]=" Aldous Huxley " [2]=" 2-2014")'

请注意，即使行具有不同数量的字段，此方法也可轻松处理输入。

Answer 2

您没有告诉我们如何提供最终输出，但这是Awk解决方案的骨架。

awk -F ' \| ' '/^### To read:/ { s=1; next }
    /^### Read:/ { s=2; next }
    s==1 { print $1 "," $2 ",\"\"" }
    s == 2 { print $1 "," $2 "," $3 }' file

这将只是从第一个子部分打印一个空的第三个字段。显然，您可以将操作调整为您喜欢的任何内容，或者如果您对此更熟悉，则可以在Python中重写该操作。

使用bash脚本分隔文本文件的各个部分

2 个答案: