我有来自其他格式化文件的languages.txt
获取的此文件(sed
):
language "Afar"
territory "Djibouti"
language "Afar"
territory "Eritrea"
language "Afar"
territory "Eritrea"
language "Afar"
territory "Ethiopia"
...
我希望得到像"语言(领土)"在同一行:
Afar (Djbouti)
Afar (Eritrea)
Afar (Ethiopia)
...
我使用此命令,但没有获得所需的结果:
sed -nE 's/^language|territory\s+\"(.+)\"$/\1 \2/p'
答案 0 :(得分:2)
假设整个文件都是"语言"和"领土"线,然后
sed 's/language \+"\(.\+\)"/\1/; N; s/\nterritory \+"\(.\+\)"/ (\1)/' languages.txt
展开:
sed '
# remove the language and quotes, leaving just the language
s/language \+"\(.\+\)"/\1/
# append a newline and read the next line
N
# remove the newline, territory and quotes
s/\nterritory \+"\(.\+\)"/ (\1)/
# implicitly print
' languages.txt
答案 1 :(得分:1)
Dumber和glen jackman's sed
answer更脆弱的版本,只有一个s
命令:
sed 'N;s/^.*"\(.*\)".*"\(.*\)"/\1 (\2)/' languages.txt
假设文件顺序是严格的,因此奇数行是语言,偶数行是语言环境。还假设引用格式是单调一致的。
输出:
Afar (Djibouti)
Afar (Eritrea)
Afar (Eritrea)
Afar (Ethiopia)
答案 2 :(得分:0)
awk
可能更容易......
$ awk -F'"' '!(NR%2){print v, "(" $2 ")"} {v=$2}' file
Afar (Djibouti)
Afar (Eritrea)
Afar (Eritrea)
Afar (Ethiopia)
将分隔符设置为引号,捕获值,如果行号甚至打印格式的值。
如果要过滤重复项
$ awk -F'"' '!(NR%2) && !a[v,$2]++{print v, "(" $2 ")"} {v=$2}' file
Afar (Djibouti)
Afar (Eritrea)
Afar (Ethiopia)