当我尝试使用手动文本解析时可以(响应[200]),但是当我从文件中更改输入时,它就会成为响应[400]。
此代码
import requests
from bs4 import BeautifulSoup
def people_spider():
file = "D:\OneDrive\Documents\GPIP\Files\scraping\idtwitter.csv"
dataset = open(file, "r")
for account in dataset:
href = 'https://twitter.com/' + account
get_single_item_data(href)
def get_single_item_data(item_url):
source_code = requests.get(item_url)
print(source_code)
print(item_url)
plain_text = source_code.text
soup = BeautifulSoup(plain_text, features='html.parser')
for item_name in soup.findAll('p', {'dir': 'ltr'}):
print(item_name.string)
people_spider()
结果是
<Response [400]>
https://twitter.com/mr_adhani
<Response [400]>
https://twitter.com/RahayuNarti
<Response [400]>
https://twitter.com/AllMicroJobs
<Response [400]>
https://twitter.com/adibambang05
<Response [400]>
https://twitter.com/NatasyaRD1
<Response [400]>
https://twitter.com/arumyuniadis
<Response [400]>
https://twitter.com/harusan_osk
<Response [400]>
https://twitter.com/LailyFauziana
<Response [400]>
https://twitter.com/Dovia_Liata707
<Response [400]>
https://twitter.com/hapzah_putry
我也更改了扩展名。但是,它不会改变任何情况
答案 0 :(得分:0)
问题在于您没有剥离account
变量。
def people_spider():
file = "D:\OneDrive\Documents\GPIP\Files\scraping\idtwitter.csv"
dataset = open(file, "r")
print(dataset)
for account in dataset:
href = 'https://twitter.com/' + account.strip()
get_single_item_data(href)