从html文件读取并编辑这些行python

时间:2018-12-13 05:30:41

标签: python readlines

我有这个文本文件

test.html

<html>
<body>
<table>
  <tr>
      <td id="A">A</td>
      <td id="B">B</td>
 </tr>
 <tr>
    <td id="C">C</td>
    <td id="D">D</td>
 </tr>
</table>
</html>
</body>

python文件

f = open('test.html')
ans = "A"
line = f.readline()
    print(line)
    if ans == 'line':
      #change the row A to a dash: <td>-</td>
    line = f.readline()
f.close()

所以我要做的是扫描html文件,当我找到A列时,我可以将其更改为破折号并保存文件 我是python的初学者,对处理文件输入和输出了解不多 请注意:没有库

4 个答案:

答案 0 :(得分:1)

尝试使用BeautifulSoup

from bs4 import BeautifulSoup

# Open test.html for reading
with open('test.html') as html_file:
    soup = BeautifulSoup(html_file.read(), features='html.parser')

    # Go through each 'A' tag and replace text with '-'
    for tag in soup.find_all(id='A'):
        tag.string.replace_with('-')

    # Store prettified version of modified html
    new_text = soup.prettify()

# Write new contents to test.html
with open('test.html', mode='w') as new_html_file:
    new_html_file.write(new_text)

其中提供以下 test.html

<html>
 <body>
  <table>
   <tr>
    <td id="A">
     -
    </td>
    <td id="B">
     B
    </td>
   </tr>
   <tr>
    <td id="C">
     C
    </td>
    <td id="D">
     D
    </td>
   </tr>
  </table>
 </body>
</html>

答案 1 :(得分:0)

您可以使用beautifulsoup或HTMLParser库。不过,beautifulsoup易于使用。您可以在此处阅读如何使用它:https://www.pythonforbeginners.com/beautifulsoup/python-beautifulsoup-basic

答案 2 :(得分:0)

正如其他人所建议的,BeautifulSoup无疑是一个非常不错的选择,但是鉴于您是一个初学者,我想向您建议这种正则表达式方法。

import re
fh= open('test.html')
content = fh.read()
content = content.replace(re.findall("<td id=\"A\">A</td>",content)[0],"<td id=\"A\">--</td>")
fh.close()
fh=open('test.html','w')
fh.write(content)

或者,如果您希望在空间方面获得更高效的代码,并且非常了解python中的文件处理方式,那么您也可以查看这种方法:

import re
fh = open("test.html",'r+')
while True:
    currpos= fh.tell()
    line = fh.readline()
    if re.findall("<td id=\"A\">A</td>",line):
         line = line.replace(re.findall("<td id=\"A\">A</td>",line)[0],"<td id=\"A\">--</td>")
         fh.seek(currpos)
         fh.writelines(line)
    if line == '':
        break
fh.close()

答案 3 :(得分:0)

在没有任何库的情况下使用Python ,您可以使用以下代码将包含A的行替换为所需的行,我只是用内置函数replace()将该行替换为字符串:

<td id="A">-</td>\n

代码:

ans = "A"
lines = []

#open file
with open(r'test.html', mode='r') as f:
    for line in f.readlines(): # iterate thru the lines
        if ans in line: # check if is in ans in line
            line = ans.replace(ans, '<td id="A">-</td>\n') # replace the line containing the and with the new line, you can change to what you want. 
        lines.append(line)

#write to a new file
with open(r'myfile.html', mode='w') as new_f:
    new_f.writelines(lines)

myfile.html内容:

 <html>
     <body>
         <table>
             <tr>
                 <td id="A">-</td>
                 <td id="B">B</td>
             </tr>
             <tr>
                 <td id="C">C</td>
                 <td id="D">D</td>
             </tr>
         </table>
    </html>
</body>