Question

我正在使用免费电话数据库网站练习一些Web抓取。返回给定字段（游戏中的角色名称）的值后，我收到的最终get.text（）值为'\ n \ n \ t \ t \ t \ t \ t \ tNami \ n \ t \ t \ t \ t \ n'。为了尝试仅获取字符名称（在这种情况下为“ Nami”），我尝试使用find_all和replace_with，但收到以下错误：

AttributeError: 'unicode' object has no attribute 'find_all'

建议？

import requests
from bs4 import BeautifulSoup

page = requests.get("http://na.op.gg/summoner/userName=Andrus%20Greysong")

pageclean = BeautifulSoup(page.content, 'html.parser')

box = pageclean.find(class_="ChampionBox.Ranked")

ChampInfo = [ci.get_text() for ci in box.select(".ChampionInfo .ChampionName")]

for tag in ChampInfo:
    tag.find_all('\n').replace_with('')

ChampInfo

要清楚，我不正在尝试删除空白。现在，“ \ n”和“ \ t”字符已成为我要清除的字符串的一部分。

如何在网络抓取时删除不需要的字符串字符

0 个答案: