Question

我有一个文本文件列表file1.txt, file2.txt, file3.txt .. filen.txt，我需要随机创建一个单个大文件，结果为^*。

要求：
1.在洗牌之前，需要撤消给定文件的记录 2.给定文件的记录应保持目标文件中的反转顺序 3.我不知道需要移植多少文件，因此代码应尽可能通用（例如，允许在列表中声明文件名）
4.文件可以有不同的大小

示例：

File1.txt
---------
File1Record1
File1Record2
File1Record3
File1Record4

File2.txt
---------
File2Record1
File2Record2


File3.txt
---------
File3Record1
File3Record2
File3Record3
File3Record4
File3Record5

输出应该是这样的：

ResultFile.txt
--------------
File3Record5   -|
File2Record2    |
File1Record4    |
File3Record4   -|
File2Record1    |
File1Record3    |-->File3 records are shuffled with the other records and 
File3Record3   -|   are correctly "reversed" and they kept the correct 
File1Record2    |   ordering
File3Record2   -|
File1Record1    |
File3Record1   -|

^{*我并不疯狂;我必须使用resultfile.txt作为输入}

导入这些文件（博客文章）

修改
结果可能有你想要的任何种类，完全或部分洗牌，均匀交错，无所谓。重要的是第1点和第2点都很荣幸。

Answer 1

您可以尝试以下操作：在第一步中zip()列表中的reversed()项：

zipped = zip(reversed(lines1), reversed(lines2), reversed(lines3))

然后你可以再次压缩拉链项目：

lst = []
for triple in zipped:
    lst.append(triple)

最后，您必须删除None

添加的所有zip()

lst.remove(None)

Answer 2

这个怎么样：

>>> l = [["1a","1b","1c","1d"], ["2a","2b"], ["3a","3b","3c","3d","3e"]]
>>> while l:
...     x = random.choice(l)
...     print x.pop(-1) 
...     if not x:
...         l.remove(x)

1d
1c
2b
3e
2a
3d
1b
3c
3b
3a
1a

您可以通过各种方式对其进行优化，但这是一般的想法。如果您不能一次读取文件但由于内存限制需要迭代它们，这也有效。在那种情况下

从文件中读取一行而不是从列表中弹出
检查EOF而不是空列表

Answer 3

一个简单的解决方案可能是创建一个列表列表，然后从随机列表中弹出一行，直到它们都用完为止：

>>> import random
>>> filerecords = [['File{0}Record{1}'.format(i, j) for j in range(5)] for i in range(5)]
>>> concatenation = []
>>> while any(filerecords):
...     selection = random.choice(filerecords)
...     if selection:
...         concatenation.append(selection.pop())
...     else:
...         filerecords.remove(selection)
... 
>>> concatenation
['File1Record4', 'File3Record4', 'File0Record4', 'File0Record3', 'File0Record2',
 'File4Record4', 'File0Record1', 'File3Record3', 'File4Record3', 'File0Record0',
 'File4Record2', 'File2Record4', 'File4Record1', 'File3Record2', 'File4Record0',
 'File2Record3', 'File1Record3', 'File2Record2', 'File2Record1', 'File3Record1',
 'File3Record0', 'File1Record2', 'File2Record0', 'File1Record1', 'File1Record0']

Answer 4

filenames = [ 'filename0', ... , 'filenameN' ]
files = [ open(fn, 'r') for fn in filenames ]
lines = [ f.readlines() for f in files ]

output = open('output', 'w')

while len(lines) > 0:
    l = random.choice( lines )
    if len(l)==0: 
        lines.remove(l)
    else:
        output.write( l.pop() )

output.close()

这里看起来很奇怪：从文件中读取的行不需要反转，因为当我们将它们写入输出文件时，我们使用list.pop()从列表的末尾获取项目（这里的内容是文件）。

Answer 5

我强烈建议您花些时间阅读Generator Tricks for Systems Programmers（PDF）。它来自PyCon 08的演示文稿，专门处理任意大型日志文件。逆转方面是一个有趣的皱纹，但演示的其余部分应直接说明你的问题。

Answer 6

filelist = (
    'file1.txt',
    'file2.txt',
    'file3.txt',
)

all_records = []

max_records = 0
for f in filelist:
    fp = open(f, 'r')
    records = fp.readlines()
    if len(records) > max_records:
        max_records = len(records)
    records.reverse()
    all_records.append(records)
    fp.close()

all_records.reverse()

res_fp = open('result.txt', 'w')
for i in range(max_records):
    for records in all_records:
        try:
            res_fp.write(records[i])
        except IndexError:
            pass
    i += 1
res_fp.close()

Answer 7

我不是蟒蛇禅师，但这是我的看法。

import random

#You have you read everything into a list from at least one of the files.
fin = open("filename1","r").readlines()
# tuple of all of the files.
fls = ( open("filename2","r"), 
       open("filename3","r"), )

for fl in fls: #iterate through tuple
   curr = 0
   clen = len(fin)
   for line in fl: #iterate through a file.
      # If we're at the end or 1 is randomly chosen, insert at current position.
      if curr > clen or round(random.random()):
         fin.insert(curr,line)
         clen = len(fin)
      curr +=1 #increment current index.

# when you're *done* reverse. It's easier.
fin.reverse()

不幸的是，很明显这是一个加权的分布。这可以通过计算每个文件的长度并将调用乘以基于该特定概率的随机来固定。我会看到我以后能否提供这个。

Answer 8

标准库中提供了可能的合并功能。它旨在合并排序列表以生成排序组合列表;垃圾输入，垃圾输出，但它确实具有维护子列表顺序的所需属性，无论如何。

def merge_files(output, *inputs):
    # all parameters are opened files with appropriate modes.
    from heapq import merge
    for line in heapq.merge(*(reversed(tuple(input)) for input in inputs)):
        output.write(line)

在单个文件中随机播放文本文件列表的记录

8 个答案: