我有一个.txt文件,在每个要迭代的行上都有不同的链接,然后解析为BeautifulSoup( response.text ,“ html.parser”)。我有几个问题。
我可以看到从文本文件开始迭代的行,但是当我将它们分配给我的request.get(websitelink)时,以前有效的代码(无迭代)不再打印我抓取的任何数据。
我收到的只是结果中的空白行。
我是Python和BeautifulSoup的新手,所以我不确定自己做错了什么。我尝试将行解析为字符串,但这似乎没有用。
import requests
from bs4 import BeautifulSoup
filename = 'item_ids.txt'
with open(filename, "r") as fp:
lines = fp.readlines()
for line in lines:
#Test to see if iteration for line to line works
print(line)
#Assign single line to websitelink
websitelink = line
#Parse websitelink into requests
response = requests.get(websitelink)
soup = BeautifulSoup(response.text, "html.parser")
#initialize and reset vars for cd loop
count = 0
weapon = ''
stats = ''
#iterate through cdata on page, and parse wanted data
for cd in soup.findAll(text=True):
if isinstance(cd, CData):
#print(cd)
count += 1
if count == 1:
weapon = cd
if count == 6:
stats = cd
#concatenate cdata info
both = weapon + " " + stats
print(both)
任何帮助将不胜感激,
谢谢
答案 0 :(得分:0)
我不知道这对您有没有帮助,但是当您将strip()
变量分配给link
时,我已经向您的websitelink
变量中添加了websitelink = line.strip()
并帮助我您的代码工作。您可以尝试。
<Grid>
<Grid.RowDefinitions>
<RowDefinition Height="Auto" SharedSizeGroup="HeaderGroup"></RowDefinition>
<RowDefinition Height="Auto"></RowDefinition>
<RowDefinition Height="Auto"></RowDefinition>
<RowDefinition></RowDefinition>
</Grid.RowDefinitions>
<TextBlock Grid.Row="1" Text="Girlie's Stuff" HorizontalAlignment="Center"/>
<StackPanel Grid.Row="2" Orientation="Horizontal">
<TextBlock Text="Column A" Margin="5,5,5,5"/>
<TextBlock Text="Column B" Margin="5,5,5,5"/>
<TextBlock Text="Column C" Margin="5,5,5,5"/>
</StackPanel>
<ListBox Grid.Row="3" x:Name="Stuff"/>
</Grid>