我正在尝试读取希伯来语的srt文件。编码应为cp1255,但无法与此一起读取。我可以用utf-8读取它,但是它不遵循希伯来语规则。我想在python中使用'pysubs2'库读取文件后以cp1255格式存储文件。有什么办法吗?
答案 0 :(得分:0)
旧问题,但想出我会发帖的情况,以防其他人尝试这样做。我在下面做了类似的事情。
import chardet
# Sniff out encoding method
with open(subtitle_input_path, 'rb') as f:
rawdata = b''.join([f.readline() for _ in range(10)])
# Encoding method and method whitelist
encoding_method = chardet.detect(rawdata)['encoding']
encoding_method_whitelist = ['utf8', 'ascii']
# If encoding method will cause issues, convert it to utf-8
if encoding_method not in encoding_method_whitelist:
# Read the old file's content
with open(subtitle_input_path, encoding=encoding_method) as subtitle_file:
subtitle_text = subtitle_file.read()
# Convert to utf-8 and write to file
with open(subtitle_input_path,'w', encoding='utf8') as subtitle_file:
subtitle_file.write(subtitle_text)