在Python中使用文件进行解析时的响应[400]

时间:2018-12-26 05:04:24

标签: python web-scraping

当我尝试使用手动文本解析时可以(响应[200]),但是当我从文件中更改输入时,它就会成为响应[400]。

此代码

import requests
from bs4 import BeautifulSoup

def people_spider():
    file = "D:\OneDrive\Documents\GPIP\Files\scraping\idtwitter.csv"
    dataset = open(file, "r")
    for account in dataset:
        href = 'https://twitter.com/' + account
        get_single_item_data(href)

def get_single_item_data(item_url):
    source_code = requests.get(item_url)
    print(source_code)
    print(item_url)
    plain_text = source_code.text
    soup = BeautifulSoup(plain_text, features='html.parser')
    for item_name in soup.findAll('p', {'dir': 'ltr'}):
        print(item_name.string)


people_spider()

结果是

<Response [400]>
https://twitter.com/mr_adhani

<Response [400]>
https://twitter.com/RahayuNarti

<Response [400]>
https://twitter.com/AllMicroJobs

<Response [400]>
https://twitter.com/adibambang05

<Response [400]>
https://twitter.com/NatasyaRD1

<Response [400]>
https://twitter.com/arumyuniadis

<Response [400]>
https://twitter.com/harusan_osk

<Response [400]>
https://twitter.com/LailyFauziana

<Response [400]>
https://twitter.com/Dovia_Liata707

<Response [400]>
https://twitter.com/hapzah_putry

我也更改了扩展名。但是,它不会改变任何情况

1 个答案:

答案 0 :(得分:0)

问题在于您没有剥离account变量。

def people_spider():
    file = "D:\OneDrive\Documents\GPIP\Files\scraping\idtwitter.csv"
    dataset = open(file, "r")
    print(dataset)
    for account in dataset:
        href = 'https://twitter.com/' + account.strip()
        get_single_item_data(href)