合并两个目录?

时间:2015-01-10 19:52:22

标签: python bash

我想将两个目录合并为一个。 在第一个给定的表中,每个章节/节标题的末尾都有一个页码。在第二个给定的表中,每个章节/节标题都没有页码,但是有另一个较低级别的节标题。 所需的输出是一个表,方法是将第二个表中较低级别的章节/节标题添加到第一个表中,以便输出表包含来自两个给定表的所有信息。

我想知道如何在bash或Python中做到这一点?感谢。

一个例子,给出

Chapter 1. The  Big  Picture 7
1.1  Levels  and  Layers  of  Abstraction  in  a  Linux  System 8
1.2  Hardware:  Understanding  Main  Memory 9
1.3  The  Kernel 10
1.4  User  Space 12
Chapter 2. Basic  Commands  and  Directory  Hierarchy 14
2.1  The  Bourne  Shell:  /bin/sh 15
2.2  Using  the  Shell 15
2.3  Basic  Commands 17

1. The Big Picture

    1.1 Levels and Layers of Abstraction in a Linux System
    1.2 Hardware: Understanding Main Memory
    1.3 The Kernel
        1.3.1 Process Management
        1.3.2 Memory Management
        1.3.3 Device Drivers and Management
        1.3.4 System Calls and Support
    1.4 User Space

2. Basic Commands and Directory Hierarchy

    2.1 The Bourne Shell: /bin/sh
    2.2 Using the Shell
        2.2.1 The Shell Window
        2.2.2 cat
        2.2.3 Standard Input and Standard Output
    2.3 Basic Commands

欲望输出是

1. The Big Picture 7

    1.1 Levels and Layers of Abstraction in a Linux System 8
    1.2 Hardware: Understanding Main Memory 9
    1.3 The Kernel 10
        1.3.1 Process Management
        1.3.2 Memory Management
        1.3.3 Device Drivers and Management
        1.3.4 System Calls and Support
    1.4 User Space 12

2. Basic Commands and Directory Hierarchy 14

    2.1 The Bourne Shell: /bin/sh 15
    2.2 Using the Shell 15
        2.2.1 The Shell Window
        2.2.2 cat
        2.2.3 Standard Input and Standard Output
    2.3 Basic Commands 17

请注意,每行的前导空格并不重要。

2 个答案:

答案 0 :(得分:2)

为了比较,这里是awk和python解决方案。

使用awk

$ awk 'NR==FNR{p[($1=="Chapter")?$2:$1]=$NF;next} {print $0,p[$1]}' file1 file2
1. The Big Picture 7

    1.1 Levels and Layers of Abstraction in a Linux System 8
    1.2 Hardware: Understanding Main Memory 9
    1.3 The Kernel 10
        1.3.1 Process Management 
        1.3.2 Memory Management 
        1.3.3 Device Drivers and Management 
        1.3.4 System Calls and Support 
    1.4 User Space 12

2. Basic Commands and Directory Hierarchy 14

    2.1 The Bourne Shell: /bin/sh 15
    2.2 Using the Shell 15
        2.2.1 The Shell Window 
        2.2.2 cat 
        2.2.3 Standard Input and Standard Output 
    2.3 Basic Commands 17

如何运作

awk一次读取输入一条记录(行),每条记录分为字段。 FNR是到目前为止从当前文件读取的总行数,NR是到目前为止从所有文件读取的总行数。考虑到这一点,让我们依次检查每个awk命令:

  • NR==FNR{p[($1=="Chapter")?$2:$1]=$NF;next}

    NR==FNR时,这意味着我们正在处理第一个文件file1,即带有页码的文件。我们将页码保存在数组p中,其中键是节号。

    页码始终是awk中表示为$NF的最后一个字段。

    轻微的复杂性是该部分是大多数行上的第一个字段,表示为$1,但是章节行上的第二个字段。因此,如果该行以Chapter开头,即$1=="Chapter",则我们使用$2作为关键字。否则使用$1。这一切都是用稍微含糊不清的三元语句完成的:

    P [($ 1 =="章&#34)$ 2:?$ 1] = $ NF

    next命令告诉awk跳过其余命令并从下一行重新开始。

  • {print $0,p[$1]}

    如果我们使用此命令,则表示我们正在处理第二个文件file2。在这种情况下,我们只需打印整行$0,然后打印页码(如果有的话p[$1]

使用Python

此处的逻辑几乎与awk版本相同:

#!/usr/bin/python
p = {}
with open('file1') as f:
    for line in f:
        words = line.split()
        p[words[1] if words[0] == 'Chapter' else words[0]] = words[-1]

with open('file2') as f:
    for line in f:
        line = line.rstrip()
        if line:
            num = line.split()[0]
            line += " " + p.get(num, '')
        print line

答案 1 :(得分:0)

假设章节是相同顺序的相同数据(无需匹配):

# open both files, plus an output file, then...

for line in file1.readlines():
    # gets you a sequence of page numbers
    page_nums.append(line.split()[-1])

for i, line in ennumerate(file2.readlines()):
    if line != '\n':
        output += ' {}\n'.format(page_nums[i]))
    else:
        output += line

# then write output back down