Question

我想以最有效的方式访问存储在目录（~20）中的.txt文件（~1000）中的每个值（~10000）。抓取数据时，我想将它们放在HTML字符串中。我这样做是为了显示一个HTML页面，其中包含每个文件的表格。伪：

    fh=open('MyHtmlFile.html','w')
    fh.write('''<head>Lots of tables</head><body>''')
    for eachDirectory in rootFolder:

        for eachFile in eachDirectory:
            concat=''

            for eachData in eachFile:
               concat=concat+<tr><td>eachData</tr></td>
            table='''
                  <table>%s</table>
                  '''%(concat)
        fh.write(table)
    fh.write('''</body>''')
    fh.close()

必须有更好的方法（我想这将需要永远）！我已经检查了set（）并阅读了一些关于哈希表的内容，而是在挖洞之前询问专家。

感谢您的时间！ /卡尔

Answer 1

import os, os.path
# If you're on Python 2.5 or newer, use 'with'
# needs 'from __future__ import with_statement' on 2.5
fh=open('MyHtmlFile.html','w')
fh.write('<html>\r\n<head><title>Lots of tables</title></head>\r\n<body>\r\n')
# this will recursively descend the tree
for dirpath, dirname, filenames in os.walk(rootFolder):
    for filename in filenames:
        # again, use 'with' on Python 2.5 or newer
        infile = open(os.path.join(dirpath, filename))
        # this will format the lines and join them, then format them into the table
        # If you're on Python 2.6 or newer you could use 'str.format' instead
        fh.write('<table>\r\n%s\r\n</table>' % 
                     '\r\n'.join('<tr><td>%s</tr></td>' % line for line in infile))
        infile.close()
fh.write('\r\n</body></html>')
fh.close()

Answer 2

为什么你“想象它会永远”？您正在阅读该文件然后将其打印出来 - 这几乎是您唯一需要的东西 - 这就是您所做的一切。您可以通过几种方式调整脚本（读取块不是行，调整缓冲区，打印输出而不是连接等），但是如果你不知道现在花多少时间，你怎么知道什么是更好/更坏？

首先配置文件，然后查找脚本是否太慢，然后找到一个缓慢的位置，然后再进行优化（或询问优化）。

pythonic方式访问文件结构中的数据

2 个答案: