我正在尝试提取年份并将其打印在单独的新列上,并保持新列对齐。
这是输入文件:
0000000124 462910 8.8 Star Wars: Episode V - The Empire Strikes Back (1980)
0000000124 698356 8.8 The Lord of the Rings: The Fellowship of the Ring (2001)
0000000233 393855 8.8 One Flew Over the Cuckoo's Nest (1975)
0000000124 733447 8.7 Inception (2010)
0000000233 411397 8.7 Goodfellas (1990)
0000000123 519051 8.7 Star Wars (1977)
0000000124 146841 8.7 Shichinin no samurai (1954)
0000000123 618195 8.7 Forrest Gump (1994)
0000000123 680520 8.7 The Matrix (1999)
0000000123 604519 8.7 The Lord of the Rings: The Two Towers (2002)
0000000233 309137 8.7 Cidade de Deus (2002)
0000000232 548307 8.6 Se7en (1995)
0000000232 459707 8.6 The Silence of the Lambs (1991)
我怎样才能在这样的单独专栏中获得这些年份?
0000000124 462910 8.8 Star Wars: Episode V - The Empire Strikes Back 1980
0000000124 698356 8.8 The Lord of the Rings: The Fellowship of the Ring 2001
0000000233 393855 8.8 One Flew Over the Cuckoo's Nest 1975
0000000124 733447 8.7 Inception 2010
0000000233 411397 8.7 Goodfellas 1990
0000000123 519051 8.7 Star Wars 1977
0000000124 146841 8.7 Shichinin no samurai 1954
0000000123 618195 8.7 Forrest Gump 1994
0000000123 680520 8.7 The Matrix 1999
0000000123 604519 8.7 The Lord of the Rings: The Two Towers 2002
0000000233 309137 8.7 Cidade de Deus 2002
0000000232 548307 8.6 Se7en 1995
0000000232 459707 8.6 The Silence of the Lambs 1991
答案 0 :(得分:5)
sed 's/)\s*$//' file|column -s '(' -t
可以处理给定的输入并为您提供预期的输出。
在这里测试:
kent$ echo "0000000124 462910 8.8 Star Wars: Episode V - The Empire Strikes Back (1980)
0000000124 698356 8.8 The Lord of the Rings: The Fellowship of the Ring (2001)
0000000233 393855 8.8 One Flew Over the Cuckoo's Nest (1975)
0000000124 733447 8.7 Inception (2010)
0000000233 411397 8.7 Goodfellas (1990)
0000000123 519051 8.7 Star Wars (1977)
0000000124 146841 8.7 Shichinin no samurai (1954)
0000000123 618195 8.7 Forrest Gump (1994)
0000000123 680520 8.7 The Matrix (1999)
0000000123 604519 8.7 The Lord of the Rings: The Two Towers (2002)
0000000233 309137 8.7 Cidade de Deus (2002)
0000000232 548307 8.6 Se7en (1995)
0000000232 459707 8.6 The Silence of the Lambs (1991)"|sed 's/)\s*$//'|column -s '(' -t
0000000124 462910 8.8 Star Wars: Episode V - The Empire Strikes Back 1980
0000000124 698356 8.8 The Lord of the Rings: The Fellowship of the Ring 2001
0000000233 393855 8.8 One Flew Over the Cuckoo's Nest 1975
0000000124 733447 8.7 Inception 2010
0000000233 411397 8.7 Goodfellas 1990
0000000123 519051 8.7 Star Wars 1977
0000000124 146841 8.7 Shichinin no samurai 1954
0000000123 618195 8.7 Forrest Gump 1994
0000000123 680520 8.7 The Matrix 1999
0000000123 604519 8.7 The Lord of the Rings: The Two Towers 2002
0000000233 309137 8.7 Cidade de Deus 2002
0000000232 548307 8.6 Se7en 1995
0000000232 459707 8.6 The Silence of the Lambs 1991
答案 1 :(得分:4)
这是一个快速破解:
$ awk '{gsub(/[()]/,"",$NF);$NF="{"$NF}1' file | column -s'{' -t
0000000124 462910 8.8 Star Wars: Episode V - The Empire Strikes Back 1980
0000000124 698356 8.8 The Lord of the Rings: The Fellowship of the Ring 2001
0000000233 393855 8.8 One Flew Over the Cuckoo's Nest 1975
0000000124 733447 8.7 Inception 2010
0000000233 411397 8.7 Goodfellas 1990
0000000123 519051 8.7 Star Wars 1977
0000000124 146841 8.7 Shichinin no samurai 1954
0000000123 618195 8.7 Forrest Gump 1994
0000000123 680520 8.7 The Matrix 1999
0000000123 604519 8.7 The Lord of the Rings: The Two Towers 2002
0000000233 309137 8.7 Cidade de Deus 2002
0000000232 548307 8.6 Se7en 1995
0000000232 459707 8.6 The Silence of the Lambs 1991
awk
用于从最后一个字段中删除括号并插入{
字符。输出通过管道传输到column
以使用{
作为分隔符构建表。我选择{
字符,因为我认为不可能在数据中的任何其他位置出现,如果不是这种情况则选择不同的字符。
如果我是你,我也会引用电影名称:
$ awk '{gsub(/[()]/,"",$NF);$NF="{"$NF;$4=q$4;$(NF-1)=$(NF-1)q}1' q='"' file | ..
0000000124 462910 8.8 "Star Wars: Episode V - The Empire Strikes Back" 1980
0000000124 698356 8.8 "The Lord of the Rings: The Fellowship of the Ring" 2001
0000000233 393855 8.8 "One Flew Over the Cuckoo's Nest" 1975
0000000124 733447 8.7 "Inception" 2010
0000000233 411397 8.7 "Goodfellas" 1990
0000000123 519051 8.7 "Star Wars" 1977
0000000124 146841 8.7 "Shichinin no samurai" 1954
0000000123 618195 8.7 "Forrest Gump" 1994
0000000123 680520 8.7 "The Matrix" 1999
0000000123 604519 8.7 "The Lord of the Rings: The Two Towers" 2002
0000000233 309137 8.7 "Cidade de Deus" 2002
0000000232 548307 8.6 "Se7en" 1995
0000000232 459707 8.6 "The Silence of the Lambs" 1991
更好的方法是使用像python这样的语言。
您可以使用字符串函数rfind()
来计算填充。如果您有python
:
import os
import sys
try:
n = int(sys.argv[2])
except IndexError:
n = 78
try:
if os.path.isfile(sys.argv[1]):
with open(sys.argv[1],'r') as f:
for line in f:
line = line.strip()
pad = n - line.rfind("(")
print line[:-7],' '*pad,line[-5:-1]
else:
print "Please provide a file."
except IndexError:
print "Please provide a file."
将其保存到table.py
这样的文件中并运行如下:
$ python table.py file
0000000124 462910 8.8 Star Wars: Episode V - The Empire Strikes Back 1980
0000000124 698356 8.8 The Lord of the Rings: The Fellowship of the Ring 2001
0000000233 393855 8.8 One Flew Over the Cuckoo's Nest 1975
0000000124 733447 8.7 Inception 2010
0000000233 411397 8.7 Goodfellas 1990
0000000123 519051 8.7 Star Wars 1977
0000000124 146841 8.7 Shichinin no samurai 1954
0000000123 618195 8.7 Forrest Gump 1994
0000000123 680520 8.7 The Matrix 1999
0000000123 604519 8.7 The Lord of the Rings: The Two Towers 2002
0000000233 309137 8.7 Cidade de Deus 2002
0000000232 548307 8.6 Se7en 1995
0000000232 459707 8.6 The Silence of the Lambs 1991
0000000123 123456 9.9 The best file (of all time) 2025
注意添加电影:
0000000123 123456 9.9 The best file (of all time) (2025)
如果释放列的位置需要增加值的传递作为第二个参数,如下所示:
$ python table.py file 100
答案 2 :(得分:4)
以下是awk
的解决方案,该解决方案适用于您的示例数据:
$ awk -F\( '{printf("%-77s %d\n", $1, $2)}' movies.txt
根据您的喜好调整格式(此处,年份位于 78 列。您可以在格式说明符中更改格式,例如,如果您希望它启动,请使用%-99s
在第100栏。
答案 3 :(得分:0)
这是一个python 2.X解决方案:
$ python --version
Python 2.7.3
$ echo "0000000124 462910 8.8 Star Wars: Episode V - The Empire Strikes Back (1980)" | python -c "import sys;s=sys.stdin.readlines()[0]; print '%s\t%s' % (s[:-7], s[-6:-2])"
0000000124 462910 8.8 Star Wars: Episode V - The Empire Strikes Back 1980
如果你的字符串在tmpfile
中,那么:
$ cat tmpfile | python -c "import sys;map(lambda i: sys.stdout.write('%s %s %s\n' % (i[:-8], ' '*(100-len(i)), i[-6:-2])), sys.stdin.readlines())"
0000000124 462910 8.8 Star Wars: Episode V - The Empire Strikes Back 1980
0000000124 698356 8.8 The Lord of the Rings: The Fellowship of the Ring 2001
0000000233 393855 8.8 One Flew Over the Cuckoo's Nest 1975
0000000124 733447 8.7 Inception 2010
0000000233 411397 8.7 Goodfellas 1990
0000000123 519051 8.7 Star Wars 1977
0000000124 146841 8.7 Shichinin no samurai 1954
0000000123 618195 8.7 Forrest Gump 1994
0000000123 680520 8.7 The Matrix 1999
0000000123 604519 8.7 The Lord of the Rings: The Two Towers 2002
0000000233 309137 8.7 Cidade de Deus 2002
0000000232 548307 8.6 Se7en 1995
0000000232 459707 8.6 The Silence of the Lambs 1991