我想将两个目录合并为一个。 在第一个给定的表中,每个章节/节标题的末尾都有一个页码。在第二个给定的表中,每个章节/节标题都没有页码,但是有另一个较低级别的节标题。 所需的输出是一个表,方法是将第二个表中较低级别的章节/节标题添加到第一个表中,以便输出表包含来自两个给定表的所有信息。
我想知道如何在bash或Python中做到这一点?感谢。
一个例子,给出
Chapter 1. The Big Picture 7
1.1 Levels and Layers of Abstraction in a Linux System 8
1.2 Hardware: Understanding Main Memory 9
1.3 The Kernel 10
1.4 User Space 12
Chapter 2. Basic Commands and Directory Hierarchy 14
2.1 The Bourne Shell: /bin/sh 15
2.2 Using the Shell 15
2.3 Basic Commands 17
和
1. The Big Picture
1.1 Levels and Layers of Abstraction in a Linux System
1.2 Hardware: Understanding Main Memory
1.3 The Kernel
1.3.1 Process Management
1.3.2 Memory Management
1.3.3 Device Drivers and Management
1.3.4 System Calls and Support
1.4 User Space
2. Basic Commands and Directory Hierarchy
2.1 The Bourne Shell: /bin/sh
2.2 Using the Shell
2.2.1 The Shell Window
2.2.2 cat
2.2.3 Standard Input and Standard Output
2.3 Basic Commands
欲望输出是
1. The Big Picture 7
1.1 Levels and Layers of Abstraction in a Linux System 8
1.2 Hardware: Understanding Main Memory 9
1.3 The Kernel 10
1.3.1 Process Management
1.3.2 Memory Management
1.3.3 Device Drivers and Management
1.3.4 System Calls and Support
1.4 User Space 12
2. Basic Commands and Directory Hierarchy 14
2.1 The Bourne Shell: /bin/sh 15
2.2 Using the Shell 15
2.2.1 The Shell Window
2.2.2 cat
2.2.3 Standard Input and Standard Output
2.3 Basic Commands 17
请注意,每行的前导空格并不重要。
答案 0 :(得分:2)
为了比较,这里是awk
和python解决方案。
awk
$ awk 'NR==FNR{p[($1=="Chapter")?$2:$1]=$NF;next} {print $0,p[$1]}' file1 file2
1. The Big Picture 7
1.1 Levels and Layers of Abstraction in a Linux System 8
1.2 Hardware: Understanding Main Memory 9
1.3 The Kernel 10
1.3.1 Process Management
1.3.2 Memory Management
1.3.3 Device Drivers and Management
1.3.4 System Calls and Support
1.4 User Space 12
2. Basic Commands and Directory Hierarchy 14
2.1 The Bourne Shell: /bin/sh 15
2.2 Using the Shell 15
2.2.1 The Shell Window
2.2.2 cat
2.2.3 Standard Input and Standard Output
2.3 Basic Commands 17
awk
一次读取输入一条记录(行),每条记录分为字段。 FNR是到目前为止从当前文件读取的总行数,NR是到目前为止从所有文件读取的总行数。考虑到这一点,让我们依次检查每个awk
命令:
NR==FNR{p[($1=="Chapter")?$2:$1]=$NF;next}
当NR==FNR
时,这意味着我们正在处理第一个文件file1
,即带有页码的文件。我们将页码保存在数组p
中,其中键是节号。
页码始终是awk
中表示为$NF
的最后一个字段。
轻微的复杂性是该部分是大多数行上的第一个字段,表示为$1
,但是章节行上的第二个字段。因此,如果该行以Chapter
开头,即$1=="Chapter"
,则我们使用$2
作为关键字。否则使用$1
。这一切都是用稍微含糊不清的三元语句完成的:
P [($ 1 =="章&#34)$ 2:?$ 1] = $ NF
next
命令告诉awk
跳过其余命令并从下一行重新开始。
{print $0,p[$1]}
如果我们使用此命令,则表示我们正在处理第二个文件file2
。在这种情况下,我们只需打印整行$0
,然后打印页码(如果有的话p[$1]
。
此处的逻辑几乎与awk
版本相同:
#!/usr/bin/python
p = {}
with open('file1') as f:
for line in f:
words = line.split()
p[words[1] if words[0] == 'Chapter' else words[0]] = words[-1]
with open('file2') as f:
for line in f:
line = line.rstrip()
if line:
num = line.split()[0]
line += " " + p.get(num, '')
print line
答案 1 :(得分:0)
假设章节是相同顺序的相同数据(无需匹配):
# open both files, plus an output file, then...
for line in file1.readlines():
# gets you a sequence of page numbers
page_nums.append(line.split()[-1])
for i, line in ennumerate(file2.readlines()):
if line != '\n':
output += ' {}\n'.format(page_nums[i]))
else:
output += line
# then write output back down