还要考虑（最佳实践）

Question

我在名为1.htm - 100.htm的文件夹中有100个文件。我运行此代码从文件中提取一些信息，并将提取的信息放在另一个文件final.txt中。目前，我必须手动运行程序100个文件。我需要构建一个循环，它可以运行程序100次，读取每个文件一次。（请仔细解释我需要在我的代码中进行的精确编辑）

以下是6.htm文件的代码：

import glob
import BeautifulSoup
from BeautifulSoup import BeautifulSoup


fo = open("6.htm", "r")
bo = open("output.txt" ,"w")
f = open("final.txt","a+")

htmltext = fo.read()
soup = BeautifulSoup(htmltext)
#print len(urls)
table = soup.findAll('table')
rows = table[0].findAll('tr');
for tr in rows:
    cols = tr.findAll('td')
    for td in cols:
        text = str(td.find(text=True)) + ';;;'
        if(text!="&nbsp;;;;"):
            bo.write(text);
            bo.write('\n');
fo.close()
bo.close()

b= open("output.txt", "r")

for j in range (1,5):
str=b.readline();
for j in range(1, 15):
str=b.readline();
c=str.split(";;;")
#print c[1]
if(c[0]=="APD ID:"):
    f.write(c[1])
    f.write("#")
if(c[0]=="Name/Class:"):
    f.write(c[1])
    f.write("#")
if(c[0]=="Source:"):
    f.write(c[1])
    f.write("#")
if(c[0]=="Sequence:"):
    f.write(c[1])
    f.write("#")
if(c[0]=="Length:"):
    f.write(c[1])
    f.write("#")
if(c[0]=="Net charge:"):
    f.write(c[1])
    f.write("#")
if(c[0]=="Hydrophobic residue%:"):
    f.write(c[1])
    f.write("#")
if(c[0]=="Boman Index:"):
    f.write(c[1])
    f.write("#")
f.write('\n');
b.close();
f.close();



f.close();
print "End"

Answer 1

import os
f = open("final.txt","a+")
for root, folders, files in os.walk('./path/to/html_files/'):
    for fileName in files:
        fo = open(os.path.abspath(root + '/' + fileName, "r")
        ...

然后你的其余代码就到了那里

还要考虑（最佳实践）

with open(os.path.abspath(root + '/' + fileName, "r") as fo:
    ...

因此，您不必忘记关闭这些文件句柄，因为您的操作系统中允许的打开文件句柄数量有限，这样可以确保您不会错误地填写它。

让您的代码看起来像这样：

import os
with open("final.txt","a+") as f:
    for root, folders, files in os.walk('./path/to/html_files/'):
        for fileName in files:
            with open(os.path.abspath(root + '/' + fileName, "r") as fo:
                ...

同样从不替换全局变量名称，例如str：

str=b.readline();

在代码行的末尾也不需要;，这是Python ..我们以舒适的方式编码！

最后但并非最不重要..

if(c[0]=="APD ID:"):
if(c[0]=="Name/Class:"):
if(c[0]=="Source:"):
if(c[0]=="Sequence:"):
if(c[0]=="Length:"):
if(c[0]=="Net charge:"):
if(c[0]=="Hydrophobic residue%:"):
if(c[0]=="Boman Index:"):

应该是：

if(c[0]=="APD ID:"):
elif(c[0]=="Name/Class:"):
elif(c[0]=="Source:"):
elif(c[0]=="Sequence:"):
elif(c[0]=="Length:"):
elif(c[0]=="Net charge:"):
elif(c[0]=="Hydrophobic residue%:"):
elif(c[0]=="Boman Index:"):

除非你在课程的路上修改c，否则你就不要......所以切换！

我只是不断发现有关此代码的更多可怕的事情（你清楚地从所有星系的例子中粘贴了副本......）：

您可以将以上所有if / elif / else压缩为一个if-block：

if(c[0] in ("APD ID:", "Name/Class:", "Source:", "Sequence:", "Length:", "Net charge:", "Hydrophobic residue%:", "Boman Index:")):
    f.write(c[1])
    f.write("#")

另外，再次跳过( ... )围绕你的if块。这是Python ..我们以舒适的方式编程：

if c[0] in ("APD ID:", "Name/Class:", "Source:", "Sequence:", "Length:", "Net charge:", "Hydrophobic residue%:", "Boman Index:"):
    f.write(c[1])
    f.write("#")

Answer 2

也许是一些看起来像这样的结构：

# declare main files
bo = open("output.txt" ,"w")
f = open("final.txt","a+")

#loop over range ii = [1,100]
for ii in range(1,101):
    fo = open(str(ii) + ".htm", "r")
    # Run program like normal
    ...
    ...
    ...
    fo.close()
f.close()
bo.close()

Answer 3

os.listdir列出特定目录中的所有文件。

正如@Torxed指出的那样，最佳做法是使用with子句（以便关闭文件句柄）。

您可以像这样查找.htm文件：

import os

# Creates a list of 1-100.htm file names
filenames = map(lambda x: str(x) + ".htm", range(1,101))

for file in os.listdir("/mydir"):
    if (file in filenames):
        # Do your logic here.

循环文件

3 个答案:

还要考虑（最佳实践）