如何使用BASH或Python脚本合并两个文件?

时间:2016-04-03 23:37:05

标签: python linux bash python-3.x

我们假设我有两个这样的文件:

文件1:

F: user1 password1 
F: user2 password2 
F: user3 password3

文件2:

server1 24000
server2 24000
server3 24000
server4 24000

我想将它们组合起来,以便使用此输出文件获取一个文件:

OuputFile:

C: server1 24000  user1 password1
C: server2 24000  user1 password1
C: server3 24000  user1 password1
C: server4 24000  user1 password1
C: server1 24000  user2 password2
C: server2 24000  user2 password2
C: server3 24000  user2 password2
C: server4 24000  user2 password2
C: server1 24000  user3 password3
C: server2 24000  user3 password3
C: server3 24000  user3 password3
C: server4 24000  user3 password3

所以,在Windows中,我制作了这个批处理文件以达到我的预期,但是,我不知道如何在 BASH(Bourne-Again shell) Python脚本

批处理文件:

@echo off
set file1=file1.txt
set file2=file2.txt
Set Output=Output_CCCam.cfg
If Exist %Output% Del %Output%
for /f "tokens=2 delims=:" %%a in ('Type "%file1%"') do (
    for /f "delims=" %%b in ('Type "%file2%"') do (
        >>%Output% echo C: %%b %%a
    )
)
Start "" Notepad %Output% 

4 个答案:

答案 0 :(得分:3)

Bash解决方案

#!/bin/bash

while IFS= read -r line1; do
    while IFS= read -r line2; do
        printf "C: %s  %s\n" "$line2" "${line1/#F: }"
    done < file2
done < file1

此循环遍历file1file1遍历file2的每一行printfline行组合输出,F: 的参数展开会删除前导C: server1 24000 user1 password1 C: server2 24000 user1 password1 C: server3 24000 user1 password1 C: server4 24000 user1 password1 C: server1 24000 user2 password2 C: server2 24000 user2 password2 C: server3 24000 user2 password2 C: server4 24000 user2 password2 C: server1 24000 user3 password3 C: server2 24000 user3 password3 C: server3 24000 user3 password3 C: server4 24000 user3 password3

结果:

join -j 50 -o 2.1,1.1 -t '~' file1 file2 | sed s'/~F:/ /;s/^/C: /'

使用join和sed的解决方案

这也可行:

join

这是对-j 50的轻微滥用。 $ join -j 50 file1 file2 F: user1 password1 server1 24000 F: user1 password1 server2 24000 F: user1 password1 server3 24000 F: user1 password1 server4 24000 F: user2 password2 server1 24000 F: user2 password2 server2 24000 F: user2 password2 server3 24000 F: user2 password2 server4 24000 F: user3 password3 server1 24000 F: user3 password3 server2 24000 F: user3 password3 server3 24000 F: user3 password3 server4 24000 表示要加入匹配的字段编号50,该编号不存在,因此被认为对所有行都相同,从而产生两个文件的笛卡尔积:

-o 2.1,1,1

为了使这些行符合正确的顺序,我们使用-t '~'指定输出格式。由于默认字段分隔符是空格,因此我们将输入中未包含的字符指定为$ join -j 50 -o 2.1,1.1 -t '~' file1 file2 server1 24000~F: user1 password1 server2 24000~F: user1 password1 server3 24000~F: user1 password1 server4 24000~F: user1 password1 server1 24000~F: user2 password2 server2 24000~F: user2 password2 server3 24000~F: user2 password2 server4 24000~F: user2 password2 server1 24000~F: user3 password3 server2 24000~F: user3 password3 server3 24000~F: user3 password3 server4 24000~F: user3 password3 的新分隔符:

~F:

最后,我们将C:替换为每行上的空格,并使用sed添加$ join -j 50 -o 2.1,1.1 -t '~' file1 file2 | sed 's/~F:/ /;s/^/C: /' C: server1 24000 user1 password1 C: server2 24000 user1 password1 C: server3 24000 user1 password1 C: server4 24000 user1 password1 C: server1 24000 user2 password2 C: server2 24000 user2 password2 C: server3 24000 user2 password2 C: server4 24000 user2 password2 C: server1 24000 user3 password3 C: server2 24000 user3 password3 C: server3 24000 user3 password3 C: server4 24000 user3 password3

$ join -j 50 file2 file1 | sed 's/F://;s/^/C:/'
C: server1 24000  user1 password1
C: server1 24000  user2 password2
C: server1 24000  user3 password3
C: server2 24000  user1 password1
C: server2 24000  user2 password2
C: server2 24000  user3 password3
C: server3 24000  user1 password1
C: server3 24000  user2 password2
C: server3 24000  user3 password3
C: server4 24000  user1 password1
C: server4 24000  user2 password2
C: server4 24000  user3 password3

如果线条的顺序无关紧要,可以稍微缩短为

Employee emp1 = emp1 = new Employee();
Employee emp2 = emp2 = new Employee();
Employee emp3 = emp3 = new Employee();

答案 1 :(得分:1)

我最初建议使用paste工具。然而,正如Benjamin W.所指出的,这个问题需要排列,尽管使用了“组合”这个词。

粘贴不能单独执行排列,更不用说删除不需要的令牌,因为只有问题作者提供的片段才会明显。按照Python 3脚本执行所要求的操作。

#!/bin/python3


def merge_lines(line_list_a, line_list_b):
    # List comprehension could be shorter if smaller identifiers were used. However, I consider readability more important than small column limits.
    return [' '.join(['C:'] + line_of_b.split() + [' '] + line_of_a.split()[1:]) for line_of_a in line_list_a for line_of_b in line_list_b]


def main():
    with open('file1.txt') as file_1:
        with open('file2.txt') as file_2:
            with open('output.txt', 'w') as output_file:
                output_file.write('\n'.join((merge_lines(file_1.readlines(), file_2.readlines()))))
                output_file.write('\n')  # Python converts '\n' to the system's default line separator.

if __name__ == '__main__':
    main()

答案 2 :(得分:1)

这个解决方案可能无法在最少字符类型方面获胜,但我认为理解起来非常简单。我假设你的文件足够小,可以很容易地同时存入内存。

#! /bin/bash

## File names of the files we want to join.

file_1st='file_1st.txt'
file_2nd='file_2nd.txt'

## Declare array variables to hold the lines of the data contained in the files.

declare -a file_data_1st
declare -a file_data_2nd

## Read both files into memory.  The `-t` option trims trailing newline
## characters.  The arrays will now contain the trimmed lies of each file.

mapfile -t file_data_1st < "${file_1st}"
mapfile -t file_data_2nd < "${file_2nd}"

## Now iterate over the lines of the first file and inside that loop over the
## lines of the second file.  Split both lines into white-space separated words
## and then re-assemble the output line as desired.  This is a little more
## general than actually needed here (you don't really have to split the lines
## from the second file.

for line_1st in "${file_data_1st[@]}"
do
    words_1st=(${line_1st})
    for line_2nd in "${file_data_2nd[@]}"
    do
        words_2nd=(${line_2nd})
        echo "C: ${words_2nd[0]} ${words_2nd[1]} ${words_1st[1]} ${words_1st[2]}"
    done
done

答案 3 :(得分:1)

我在评论中提到的Python:

filename1 = 'file1.txt'
filename2 = 'file2.txt'

user_data = []
server_data = []

with open(filename1, 'r') as fp:
    user_data = map(lambda x: x.split()[1:], fp.readlines())

with open(filename2, 'r') as fp:
    server_data = map(lambda x: x.split(), fp.readlines())

output_filename = 'file3.txt'

with open(output_filename, 'w') as fp:
    for user_row in user_data:
        for server_row in server_data:
            fp.write("C: %s %s\n" % (" ".join(server_row), " ".join(user_row)))