我在处理文件时遇到问题。该函数首先在所有文件中搜索字符串。然后用新值替换它们。我毕竟不知道如何在同一个文件中写入新内容。我认为问题是文件模式,但不确定如何处理,因为当我在其他地方更改模式时,会出现新错误。
def replace_urls(self):
find_string_1 = '/blog/'
find_string_2 = '/contakt/'
replace_string_1 = 'blog.html'
replace_string_2 = 'contact.html'
exclude_dirs = ['media', 'static']
for (root_path, dirs, files) in os.walk(f'{settings.BASE_DIR}/static/'):
dirs[:] = [d for d in dirs if d not in exclude_dirs]
for file in files:
get_file = os.path.join(root_path, file)
with open(get_file, 'wb', encoding='utf-8') as f:
soup = BeautifulSoup(f, "lxml", from_encoding="utf-8")
blog_text = soup.find('a', attrs={'href':find_string_1})
contact_text = soup.find('a', attrs={'href':find_string_2})
blog_text.attrs['href'] = replace_string_1
contact_text.attrs['href'] = replace_string_2
f.write(soup.prettify('utf-8'))
上面的错误代码:
以open(get_file,'wb',encoding ='utf-8')作为f:
ValueError:二进制模式不使用编码参数
重要:
我想将此功能用作django命令:
所以我用python manage.py command_name
from django.core.management.base import BaseCommand
from django.conf import settings
import os
import codecs
from bs4 import BeautifulSoup
from lxml import etree
class Command(BaseCommand):
help='change urls in each header to static version'
def replace_urls(self):
find_string_1 = '/blog/'
find_string_2 = '/contact/'
replace_string_1 = 'blog.html'
replace_string_2 = 'contact.html'
exclude_dirs = ['media', 'static']
for (root_path, dirs, files) in os.walk(f'{settings.BASE_DIR}/static/'):
dirs[:] = [d for d in dirs if d not in exclude_dirs]
for file in files:
get_file = os.path.join(root_path, file)
with open(get_file, 'wb', encoding='utf-8') as f:
soup = BeautifulSoup(f, "lxml", from_encoding="utf-8")
blog_text = soup.find('a', attrs={'href':find_string_1})
contact_text = soup.find('a', attrs={'href':find_string_2})
blog_text.attrs['href'] = replace_string_1
contact_text.attrs['href'] = replace_string_2
f.write(soup.prettify('utf-8'))
def handle(self, *args, **kwargs):
try:
self.replace_urls()
self.stdout.write(self.style.SUCCESS(f'********** Command has been execute without any error **********'))
except Exception:
self.stdout.write(self.style.NOTICE(f'********** Command does not exist ! **********'))
答案 0 :(得分:0)
在开头添加“ b”将打开二进制模式。
此模式不支持编码。
您可以为此使用编解码器库。
这是我的建议:
import codecs
def replace_urls(self):
find_string_1 = '/blog/'
find_string_2 = '/contakt/'
replace_string_1 = 'blog.html'
replace_string_2 = 'contact.html'
exclude_dirs = ['media', 'static']
for (root_path, dirs, files) in os.walk(f'{settings.BASE_DIR}/static/'):
dirs[:] = [d for d in dirs if d not in exclude_dirs]
for file in files:
get_file = os.path.join(root_path, file)
with codecs.open(get_file, "w", "utf-8") as f:
soup = BeautifulSoup(f, "lxml", from_encoding="utf-8")
blog_text = soup.find('a', attrs={'href':find_string_1})
contact_text = soup.find('a', attrs={'href':find_string_2})
blog_text.attrs['href'] = replace_string_1
contact_text.attrs['href'] = replace_string_2
f.write(soup.prettify('utf-8'))
简单功能测试:
import codecs
file = codecs.open("test.txt", "w", "utf-8")
file.write(u'\ufeff')
file.close()
另一种可能性是跳过编码:
with open(get_file, 'w', encoding='utf-8') as f:
答案 1 :(得分:0)
正如错误日志中提到的那样,您正在以字节模式进行写入,这意味着数据已经被编码,因此您基本上需要将字节保存到文件中。您既可以在写入文件之前进行编码,也可以将编码后的字节写入文件。
您已经使用soup.prettify('utf-8')
对html进行了编码。这意味着无需将encoding
参数传递给open
函数,例如:
from bs4 import BeautifulSoup
soup = BeautifulSoup("<html><header></header></html>")
with open("test.html", "wb") as f:
f.write(soup.prettify('utf-8'))
这应该对您有用:
def replace_urls(self):
find_string_1 = '/blog/'
find_string_2 = '/contakt/'
replace_string_1 = 'blog.html'
replace_string_2 = 'contact.html'
exclude_dirs = ['media', 'static']
for (root_path, dirs, files) in os.walk(f'{settings.BASE_DIR}/static/'):
dirs[:] = [d for d in dirs if d not in exclude_dirs]
for file in files:
get_file = os.path.join(root_path, file)
with open(get_file, 'wb') as f:
soup = BeautifulSoup(f, "lxml", from_encoding="utf-8")
blog_text = soup.find('a', attrs={'href':find_string_1})
contact_text = soup.find('a', attrs={'href':find_string_2})
blog_text.attrs['href'] = replace_string_1
contact_text.attrs['href'] = replace_string_2
f.write(soup.prettify('utf-8'))