我正在创建一个Python脚本,基本上这部分我遇到了问题,它只需要一个网页帖子的标题。 Python不理解重音,我已经尝试了我所知道的一切 1 - 将此代码放在第一行# - * - coding:utf-8 - * - 2 - 把.encode(“utf-8”)
代码:
# -*- coding: utf-8 -*-
import re
import requests
def opena(url):
headers = {'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64)AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36'}
lexdan1 = requests.get(url,headers=headers)
lexdan2 = lexdan1.text
lexdan1.close
return lexdan2
dan = []
a = opena('http://www.megafilmesonlinehd.com/filmes-lancamentos')
d = re.compile('<strong class="tt-filme">(.+?)</strong>').findall(a)
for name in d:
name = name.encode("utf-8")
dan.append(name)
print dan
这就是我得到的:
['Porta dos Fundos: Contrato Vital\xc3\xadcio HD 720p', 'Os 28 Homens de Panfilov Legendado HD', 'Estrelas Al\xc3\xa9m do Tempo Dublado', 'A Volta do Ju\xc3\xadzo Final Dublado Full HD 1080p', 'The Love Witch Legendado HD', 'Manchester \xc3\x80 Beira-Mar Legendado', 'Semana do P\xc3\xa2nico Dublado HD 720p', 'At\xc3\xa9 o \xc3\x9altimo Homem Legendado HD 720p', 'Arbor Demon Legendado HD 720p', 'Esquadr\xc3\xa3o de Elite Dublado Full HD 1080p', 'Ouija Origem do Mal Dublado Full HD 1080p', 'As Muitas Mulheres da Minha Vida Dublado HD 720p', 'Um Novo Desafio para Callan e sua Equipe Dublado Full HD 1080p', 'Terror Herdado Dublado DVDrip', 'Officer Downe Legendado HD', 'N\xc3\xa3o Bata Duas Vezes Legendado HD', 'Eu, Daniel Blake Legendado HD', 'Sangue Pela Gl\xc3\xb3ria Legendado', 'Quase 18 Legendado HD 720p', 'As Aventuras de Robinson Cruso\xc3\xa9 Dublado Full HD 1080p', 'Indigna\xc3\xa7\xc3\xa3o Dublado HD 720p']
答案 0 :(得分:1)
由于您要告诉口译员打印列表,口译员会调用list
班级__str__
方法。当您调用容器的__str__
方法时,它会使用__repr__
方法为每个包含的对象(在本例中为str
类型)。 str
类型__repr__
方法不会转换unicode字符,但会转换__str__
方法(当您打印单个str
对象时会调用该方法确实。
这是一个很好的问题,可以帮助解释这些差异: Difference between __str__ and __repr__ in Python
如果您单独打印每个字符串,您应该得到您想要的结果。
import re
import requests
def opena(url):
headers = {'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64)AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36'}
lexdan1 = requests.get(url,headers=headers)
lexdan2 = lexdan1.text
lexdan1.close
return lexdan2
dan = []
a = opena('http://www.megafilmesonlinehd.com/filmes-lancamentos')
d = re.compile('<strong class="tt-filme">(.+?)</strong>').findall(a)
for name in d:
dan.append(name)
for item in dan:
print item
答案 1 :(得分:0)
当打印list
时,表示内部的任何内容(调用__repr__
方法),而不是打印(调用__str__
方法):
class test():
def __repr__(self):
print '__repr__'
return ''
def __str__(self):
print '__str__'
return ''
会得到你:
>>> a = [test()]
>>> a
[__repr__
]
>>> print a
[__repr__
]
>>> print a[0]
__str__
字符串的__repr__
方法不会转换特殊字符(甚至不是\t
或\n
)。