循环行并抓取数据以从Excel文件中获取输入

时间:2019-10-14 14:24:51

标签: python web-scraping

我想使用来自excel的输入值来抓取Web数据,并为获取的每个row_value抓取Web并将输出保存到相同的excel文件中。

from bs4 import BeautifulSoup
import requests 
from urllib import request
import os
import pandas as pd


ciks = pd.read_csv("ciks.csv")
ciks.head()

输出

    CIK
0   1557822
1   1598429
2   1544670
3   1574448
4   1592290

然后

for x in ciks:
    url="https://www.sec.gov/cgi-bin/browse-edgar?CIK=" + x +"&owner=exclude&action=getcompany"
    r = request.urlopen(url)
    bytecode = r.read()
    htmlstr = bytecode.decode()
    soup = BeautifulSoup(bytecode)
    t = soup.find('span',{'class':'companyName'})
    print(t.text)

我有个错误: ----> 9个打印(文本)

AttributeError:'NoneType'对象没有属性'text'

在这里,我要使用每行值作为CSV文件的输入来抓取Web数据。

1 个答案:

答案 0 :(得分:0)

将列值转换为列表然后在for循环中使用它会更容易-请参见下面的解决方案,

from bs4 import BeautifulSoup
import requests 
from urllib import request
import os
import pandas as pd
#ciks = pd.read_csv("ciks.csv")
df = pd.read_csv("ciks.csv")
mylist = df['CIK'].tolist()# CIK is the column name

company =[]
for item in mylist:
    print(item)
    url="https://www.sec.gov/cgi-bin/browse-edgar?CIK=" + str(item) +"&owner=exclude&action=getcompany"
    r = request.urlopen(url)
    bytecode = r.read()
    htmlstr = bytecode.decode()
    soup = BeautifulSoup(bytecode,features="lxml")
    t = soup.find('span',{'class':'companyName'})
    company.append(t.text)
    print(t.text)
df.assign(company= company)
print(df)

df.to_csv("ciks.csv")