Question

对于荒谬的称号感到抱歉;这可能就是我无法在谷歌找到答案的原因。

我有5个文本文件，我想要合并为1.我想要这样的格式：

line1 of file1
line1 of file2
line1 of file3
line1 of file4
line1 of file5
line2 of file1
line2 of file2
line2 of file3
line2 of file4
line2 of file5

等等。

我尝试使用下面的bash命令，但看起来它对于sed或其他东西来说太多了：它只是将文本插入到第一行，而不是我正在调用的变量的行。

for ((num=1; num<=66; num++)) ; do
    queryline=$(sed -n "${num}p" "file2.txt")
    sed -i "${num}i ${queryline}" "file1.txt"
done

（我也试过了）

for ((num=1; num<=66; num++)) ; do
    numa=$((num + 1))
    queryline=$(sed -n "${num}p" "file2.txt")
    sed -i "${numa}i ${queryline}" "file1.txt"
done

我认为使用python（3.4）可能会更容易，但我不知道该怎么做。请任何人提示？

Answer 1

使用contextlib.ExitStack（）将输入文件作为一个组来处理，并使用zip来读取所有文件中的行：

import contextlib
import os

filenames = ['a','b','c','d','e']
output_file = 'fred'

# setup files for test
for filename in filenames:
    with open(filename, 'w') as fp:
        for i in range(10):
            fp.write('%s %d\n' % (filename, i))
if os.path.exists('fred'):
    os.remove('fred')

# open all the files and use zip to interleave the lines    
with open(output_file, 'w') as out_file, contextlib.ExitStack() as in_files:
    files = [in_files.enter_context(open(fname)) for fname in filenames]
    for lines in zip(*files):
        # if you're not sure last line has a \n
        for line in lines:
            out_file.write(line)
            if not line.endswith('\n'):
                out_file.write('\n')
        # if you are sure last line has a \n
        # out_file.write(''.join(lines))

print(open('fred').read())

Answer 2

如果你确定你有5个文件，这将有效。如果您需要对不同数量的文件进行此操作，则会更复杂一些。

with open("file1.txt") as f:
    file1 = f.readlines()
with open("file2.txt") as f:
    file2 = f.readlines()
with open("file3.txt") as f:
    file3 = f.readlines()
with open("file4.txt") as f:
    file4 = f.readlines()
with open("file5.txt") as f:
    file5 = f.readlines()
outfile = open("outfile.txt", "w")
for aline in [line for foo in zip(file1, file2, file3, file4, file5) for line in foo]:
    outfile.write(aline)
outfile.close()

Answer 3

你的bash没有用，因为你试图插入一条在插入之前不存在的行。

echo "\n" > file_to_insert.txt
for i in {1..5};do
  for((num=1;num<66;num++);do
    line_num=$((num*i)
    queryline=$(sed -n '${num}p' 'file${i}.txt'
    sed -i "${num}i '$queryline'" 'file_to_insert.txt'
done

Answer 4

这是gnu awk（gnu对ARGIND（文件选择器）的做法）

awk -v t=5 '{c=c<FNR?FNR:c; for (i=1;i<=t;i++) if (ARGIND==i) a[i FS FNR]=$0} END {for (i=1;i<=c;i++) for (j=1;j<=t;j++) print a[j FS i]}' file1 file2 file3 file4 file5

您将t设置为文件数。

示例：

cat f1
file1 one
file1 two
file1 three
file1 four

cat f2
file2 one
file2 two
file2 three
file2 four

cat f3
file3 one
file3 two
file3 three
file3 four

awk -v t=3 '{c=c<FNR?FNR:c; for (i=1;i<=t;i++) if (ARGIND==i) a[i FS FNR]=$0} END {for (i=1;i<=c;i++) for (j=1;j<=t;j++) print a[j FS i]}' f1 f2 f3
file1 one
file2 one
file3 one
file1 two
file2 two
file3 two
file1 three
file2 three
file3 three
file1 four
file2 four
file3 four

它是如何运作的？

awk -v t=3 '                    # Set t to number of files
    {c=c<FNR?FNR:c              # Find the file with most records and store number in c
    for (i=1;i<=t;i++)      # Loop trough one and one file
        if (ARGIND==i)          # Test what file we are on
            a[i FS FNR]=$0}     # Stor data in array a
END {
    for (i=1;i<=c;i++)          # Loop trough line number
        for (j=1;j<=t;j++)      # Loop trough file number
            print a[j FS i]}    # Print data from array
' f1 f2 f3                      # Read the files

Answer 5

实现您想要的目标的一个很好的可能性是坚持使用标准实用程序：建议paste（由POSIX指定）：

paste -d '\n' file1 file2 file3 file4 file5

或者，如果你喜欢Bashisms：

paste -d '\n' file{1..5}

这可以简单地概括为任意数量的文件。

将file1的每一行复制到file2的每一行（Python）

5 个答案: