我有这个文本文件
test.html
<html>
<body>
<table>
<tr>
<td id="A">A</td>
<td id="B">B</td>
</tr>
<tr>
<td id="C">C</td>
<td id="D">D</td>
</tr>
</table>
</html>
</body>
python文件
f = open('test.html')
ans = "A"
line = f.readline()
print(line)
if ans == 'line':
#change the row A to a dash: <td>-</td>
line = f.readline()
f.close()
所以我要做的是扫描html文件,当我找到A列时,我可以将其更改为破折号并保存文件 我是python的初学者,对处理文件输入和输出了解不多 请注意:没有库
答案 0 :(得分:1)
尝试使用BeautifulSoup
:
from bs4 import BeautifulSoup
# Open test.html for reading
with open('test.html') as html_file:
soup = BeautifulSoup(html_file.read(), features='html.parser')
# Go through each 'A' tag and replace text with '-'
for tag in soup.find_all(id='A'):
tag.string.replace_with('-')
# Store prettified version of modified html
new_text = soup.prettify()
# Write new contents to test.html
with open('test.html', mode='w') as new_html_file:
new_html_file.write(new_text)
其中提供以下 test.html :
<html>
<body>
<table>
<tr>
<td id="A">
-
</td>
<td id="B">
B
</td>
</tr>
<tr>
<td id="C">
C
</td>
<td id="D">
D
</td>
</tr>
</table>
</body>
</html>
答案 1 :(得分:0)
您可以使用beautifulsoup或HTMLParser库。不过,beautifulsoup易于使用。您可以在此处阅读如何使用它:https://www.pythonforbeginners.com/beautifulsoup/python-beautifulsoup-basic
答案 2 :(得分:0)
正如其他人所建议的,BeautifulSoup无疑是一个非常不错的选择,但是鉴于您是一个初学者,我想向您建议这种正则表达式方法。
import re
fh= open('test.html')
content = fh.read()
content = content.replace(re.findall("<td id=\"A\">A</td>",content)[0],"<td id=\"A\">--</td>")
fh.close()
fh=open('test.html','w')
fh.write(content)
或者,如果您希望在空间方面获得更高效的代码,并且非常了解python中的文件处理方式,那么您也可以查看这种方法:
import re
fh = open("test.html",'r+')
while True:
currpos= fh.tell()
line = fh.readline()
if re.findall("<td id=\"A\">A</td>",line):
line = line.replace(re.findall("<td id=\"A\">A</td>",line)[0],"<td id=\"A\">--</td>")
fh.seek(currpos)
fh.writelines(line)
if line == '':
break
fh.close()
答案 3 :(得分:0)
在没有任何库的情况下使用Python ,您可以使用以下代码将包含A的行替换为所需的行,我只是用内置函数replace()将该行替换为字符串:
<td id="A">-</td>\n
代码:
ans = "A"
lines = []
#open file
with open(r'test.html', mode='r') as f:
for line in f.readlines(): # iterate thru the lines
if ans in line: # check if is in ans in line
line = ans.replace(ans, '<td id="A">-</td>\n') # replace the line containing the and with the new line, you can change to what you want.
lines.append(line)
#write to a new file
with open(r'myfile.html', mode='w') as new_f:
new_f.writelines(lines)
myfile.html
内容:
<html>
<body>
<table>
<tr>
<td id="A">-</td>
<td id="B">B</td>
</tr>
<tr>
<td id="C">C</td>
<td id="D">D</td>
</tr>
</table>
</html>
</body>