我正在尝试从csv文件的第一列中读取网址。在csv文件中,我总共要读取6051个URL。为此,我尝试了以下代码:
urls = []
with open("C:/Users/hyoungm/Downloads/urls.csv") as csvfile:
blogurl = csv.reader(csvfile)
for row in blogurl:
row = row[0]
print(row)
但是,显示的url数量只有65个。我不知道为什么url的总数与csv文件看起来不同。
有人可以帮助我弄清楚如何从csv文件中读取所有网址(总计6051)吗?
要读取csv文件中的所有网址,我还尝试了几种不同的代码,导致相同数量的网址(即65个网址)或失败,例如: 1)
openfile = open("C:/Users/hyoungm/Downloads/urls.csv")
r = csv.reader(openfile)
for i in r:
#the urls are in the first column ... 0 refers to the first column
blogurls = i[0]
print (blogurls)
len(blogurls)
2)
urls = pd.read_csv("C:/Users/hyoungm/Downloads/urls.csv")
with closing(requests.get(urls, stream = True)) as r:
reader = csv.reader(r.iter_lines(), delimiter = ',', quotechar = '""')
for row in reader:
print(row)
len(row)
3)
with open("C:/Users/hyoungm/Downloads/urls.csv") as csvfile:
lines = csv.reader(csvfile)
for i, line in enumerate(lines):
if i == 0:
for line in csvfile:
print(line[1:])
len(line)
4)和
blogurls = []
with open("C:/Users/hyoungm/Downloads/urls.csv") as csvfile:
r = csv.reader(csvfile)
for i in r:
blogurl = i[0]
r = requests.get(blogurl)
blogurls.append(blogurl)
for url in blogurls:
page = urlopen(url[0]).read()
soup = BeautifulSoup(page, "html.parser")
len(blogurls)
我希望输出的csv文件中最初收集的是6051个网址,而不是65个网址。
阅读所有网址后,我将从每个网址中抓取文字数据。我应该使用所有6051个网址获得以下文本数据。请点击以下图片链接:
答案 0 :(得分:0)
以下两种方法对我有用:
import requests
r = requests.get('https://raw.githubusercontent.com/GemmyMoon/MultipleUrls/master/urls.csv')
urls = r.text.splitlines()
print(len(urls)) # Returns 6051
和
import csv
import requests
from io import StringIO
r = requests.get('https://raw.githubusercontent.com/GemmyMoon/MultipleUrls/master/urls.csv')
reader = csv.reader(StringIO(r.text))
urls = [line[0] for line in reader]
print(len(urls)) # Returns 6051