从列中提取信息并将其打印在单独的对齐列上

时间:2013-07-26 08:10:16

标签: bash shell awk

我正在尝试提取年份并将其打印在单独的新列上,并保持新列对齐。

这是输入文件:

0000000124  462910   8.8  Star Wars: Episode V - The Empire Strikes Back (1980) 
0000000124  698356   8.8  The Lord of the Rings: The Fellowship of the Ring (2001)
0000000233  393855   8.8  One Flew Over the Cuckoo's Nest (1975)
0000000124  733447   8.7  Inception (2010)
0000000233  411397   8.7  Goodfellas (1990)
0000000123  519051   8.7  Star Wars (1977)
0000000124  146841   8.7  Shichinin no samurai (1954)
0000000123  618195   8.7  Forrest Gump (1994)
0000000123  680520   8.7  The Matrix (1999)
0000000123  604519   8.7  The Lord of the Rings: The Two Towers (2002)
0000000233  309137   8.7  Cidade de Deus (2002)
0000000232  548307   8.6  Se7en (1995)
0000000232  459707   8.6  The Silence of the Lambs (1991)

我怎样才能在这样的单独专栏中获得这些年份?

0000000124  462910   8.8  Star Wars: Episode V - The Empire Strikes Back                  1980
0000000124  698356   8.8  The Lord of the Rings: The Fellowship of the Ring               2001
0000000233  393855   8.8  One Flew Over the Cuckoo's Nest                                 1975
0000000124  733447   8.7  Inception                                                       2010
0000000233  411397   8.7  Goodfellas                                                      1990
0000000123  519051   8.7  Star Wars                                                       1977
0000000124  146841   8.7  Shichinin no samurai                                            1954
0000000123  618195   8.7  Forrest Gump                                                    1994
0000000123  680520   8.7  The Matrix                                                      1999
0000000123  604519   8.7  The Lord of the Rings: The Two Towers                           2002
0000000233  309137   8.7  Cidade de Deus                                                  2002
0000000232  548307   8.6  Se7en                                                           1995
0000000232  459707   8.6  The Silence of the Lambs                                        1991

4 个答案:

答案 0 :(得分:5)

sed 's/)\s*$//' file|column -s '(' -t

可以处理给定的输入并为您提供预期的输出。

在这里测试:

kent$  echo "0000000124  462910   8.8  Star Wars: Episode V - The Empire Strikes Back (1980) 
0000000124  698356   8.8  The Lord of the Rings: The Fellowship of the Ring (2001)
0000000233  393855   8.8  One Flew Over the Cuckoo's Nest (1975)
0000000124  733447   8.7  Inception (2010)
0000000233  411397   8.7  Goodfellas (1990)
0000000123  519051   8.7  Star Wars (1977)
0000000124  146841   8.7  Shichinin no samurai (1954)
0000000123  618195   8.7  Forrest Gump (1994)
0000000123  680520   8.7  The Matrix (1999)
0000000123  604519   8.7  The Lord of the Rings: The Two Towers (2002)
0000000233  309137   8.7  Cidade de Deus (2002)
0000000232  548307   8.6  Se7en (1995)
0000000232  459707   8.6  The Silence of the Lambs (1991)"|sed 's/)\s*$//'|column -s '(' -t
0000000124  462910   8.8  Star Wars: Episode V - The Empire Strikes Back      1980
0000000124  698356   8.8  The Lord of the Rings: The Fellowship of the Ring   2001
0000000233  393855   8.8  One Flew Over the Cuckoo's Nest                     1975
0000000124  733447   8.7  Inception                                           2010
0000000233  411397   8.7  Goodfellas                                          1990
0000000123  519051   8.7  Star Wars                                           1977
0000000124  146841   8.7  Shichinin no samurai                                1954
0000000123  618195   8.7  Forrest Gump                                        1994
0000000123  680520   8.7  The Matrix                                          1999
0000000123  604519   8.7  The Lord of the Rings: The Two Towers               2002
0000000233  309137   8.7  Cidade de Deus                                      2002
0000000232  548307   8.6  Se7en                                               1995
0000000232  459707   8.6  The Silence of the Lambs                            1991

答案 1 :(得分:4)

这是一个快速破解:

$ awk '{gsub(/[()]/,"",$NF);$NF="{"$NF}1' file | column -s'{' -t 
0000000124 462910 8.8 Star Wars: Episode V - The Empire Strikes Back      1980
0000000124 698356 8.8 The Lord of the Rings: The Fellowship of the Ring   2001
0000000233 393855 8.8 One Flew Over the Cuckoo's Nest                     1975
0000000124 733447 8.7 Inception                                           2010
0000000233 411397 8.7 Goodfellas                                          1990
0000000123 519051 8.7 Star Wars                                           1977
0000000124 146841 8.7 Shichinin no samurai                                1954
0000000123 618195 8.7 Forrest Gump                                        1994
0000000123 680520 8.7 The Matrix                                          1999
0000000123 604519 8.7 The Lord of the Rings: The Two Towers               2002
0000000233 309137 8.7 Cidade de Deus                                      2002
0000000232 548307 8.6 Se7en                                               1995
0000000232 459707 8.6 The Silence of the Lambs                            1991

awk用于从最后一个字段中删除括号并插入{字符。输出通过管道传输到column以使用{作为分隔符构建表。我选择{字符,因为我认为不可能在数据中的任何其他位置出现,如果不是这种情况则选择不同的字符。

如果我是你,我也会引用电影名称:

$ awk '{gsub(/[()]/,"",$NF);$NF="{"$NF;$4=q$4;$(NF-1)=$(NF-1)q}1' q='"' file | ..
0000000124 462910 8.8 "Star Wars: Episode V - The Empire Strikes Back"      1980
0000000124 698356 8.8 "The Lord of the Rings: The Fellowship of the Ring"   2001
0000000233 393855 8.8 "One Flew Over the Cuckoo's Nest"                     1975
0000000124 733447 8.7 "Inception"                                           2010
0000000233 411397 8.7 "Goodfellas"                                          1990
0000000123 519051 8.7 "Star Wars"                                           1977
0000000124 146841 8.7 "Shichinin no samurai"                                1954
0000000123 618195 8.7 "Forrest Gump"                                        1994
0000000123 680520 8.7 "The Matrix"                                          1999
0000000123 604519 8.7 "The Lord of the Rings: The Two Towers"               2002
0000000233 309137 8.7 "Cidade de Deus"                                      2002
0000000232 548307 8.6 "Se7en"                                               1995
0000000232 459707 8.6 "The Silence of the Lambs"                            1991

更好的方法是使用像python这样的语言。

您可以使用字符串函数rfind()来计算填充。如果您有python

,则应使用以下脚本
import os
import sys

try:
    n = int(sys.argv[2])
except IndexError:
    n = 78
try:
    if os.path.isfile(sys.argv[1]):
        with open(sys.argv[1],'r') as f:
            for line in f:
                line = line.strip()
                pad = n - line.rfind("(")
                print line[:-7],' '*pad,line[-5:-1]
    else:
        print "Please provide a file."
except IndexError:
    print "Please provide a file."

将其保存到table.py这样的文件中并运行如下:

$ python table.py file
0000000124  462910   8.8  Star Wars: Episode V - The Empire Strikes Back        1980
0000000124  698356   8.8  The Lord of the Rings: The Fellowship of the Ring     2001
0000000233  393855   8.8  One Flew Over the Cuckoo's Nest                       1975
0000000124  733447   8.7  Inception                                             2010
0000000233  411397   8.7  Goodfellas                                            1990
0000000123  519051   8.7  Star Wars                                             1977
0000000124  146841   8.7  Shichinin no samurai                                  1954
0000000123  618195   8.7  Forrest Gump                                          1994
0000000123  680520   8.7  The Matrix                                            1999
0000000123  604519   8.7  The Lord of the Rings: The Two Towers                 2002
0000000233  309137   8.7  Cidade de Deus                                        2002
0000000232  548307   8.6  Se7en                                                 1995
0000000232  459707   8.6  The Silence of the Lambs                              1991
0000000123  123456   9.9  The best file (of all time)                           2025

注意添加电影:

0000000123  123456   9.9  The best file (of all time) (2025)

如果释放列的位置需要增加值的传递作为第二个参数,如下所示:

$ python table.py file 100 

答案 2 :(得分:4)

以下是awk的解决方案,该解决方案适用于您的示例数据:

$ awk -F\( '{printf("%-77s %d\n", $1, $2)}' movies.txt

根据您的喜好调整格式(此处,年份位于 78 列。您可以在格式说明符中更改格式,例如,如果您希望它启动,请使用%-99s在第100栏。

答案 3 :(得分:0)

这是一个python 2.X解决方案:

$ python --version
Python 2.7.3
$ echo "0000000124  462910   8.8  Star Wars: Episode V - The Empire Strikes Back (1980)" | python -c "import sys;s=sys.stdin.readlines()[0]; print '%s\t%s' % (s[:-7], s[-6:-2])"
0000000124  462910   8.8  Star Wars: Episode V - The Empire Strikes Back    1980

如果你的字符串在tmpfile中,那么:

$ cat tmpfile | python -c "import sys;map(lambda i: sys.stdout.write('%s %s %s\n' % (i[:-8], ' '*(100-len(i)), i[-6:-2])), sys.stdin.readlines())"
0000000124  462910   8.8  Star Wars: Episode V - The Empire Strikes Back                      1980
0000000124  698356   8.8  The Lord of the Rings: The Fellowship of the Ring                   2001
0000000233  393855   8.8  One Flew Over the Cuckoo's Nest                                     1975
0000000124  733447   8.7  Inception                                                           2010
0000000233  411397   8.7  Goodfellas                                                          1990
0000000123  519051   8.7  Star Wars                                                           1977
0000000124  146841   8.7  Shichinin no samurai                                                1954
0000000123  618195   8.7  Forrest Gump                                                        1994
0000000123  680520   8.7  The Matrix                                                          1999
0000000123  604519   8.7  The Lord of the Rings: The Two Towers                               2002
0000000233  309137   8.7  Cidade de Deus                                                      2002
0000000232  548307   8.6  Se7en                                                               1995
0000000232  459707   8.6  The Silence of the Lambs                                            1991