我编写了一个脚本来删除多个url,在BeautifulSoup的帮助下将有用的信息添加到两个数组(ID和名称),然后将这些数组的值添加到MySQL表中,其中ids [0]和名称[0 ]是表格的第0行,依此类推......
然而,我的代码非常难看,我相信有一种比我更好的方法。
任何人都可以给我一个提示吗?我特别需要一个关于如何通过两个数组迭代的输入...
提前致谢!
#!/usr/bin/env python
from bs4 import BeautifulSoup
from urllib import urlopen
import MySQLdb
#MySQL Connection
mysql_opts = {
'host': "localhost",
'user': "********",
'pass': "********",
'db': "somedb"
}
mysql = MySQLdb.connect(mysql_opts['host'], mysql_opts['user'], mysql_opts['pass'], mysql_opts['db'])
#Add Data SQL Query
data_query = ("INSERT INTO tablename "
"(id, name) "
"VALUES (%s, %s)")
#Urls to scrape
url1 = 'http://somepage.com'
url2 = 'http://someotherpage.com'
url3 = 'http://athirdpage.com'
#URL Array
urls = (url1,url2,url3)
#Url loop
for url in urls:
soupPage = urlopen(url)
soup = BeautifulSoup (soupPage)
ids = soup.find_all('a', style="display:block")
names = soup.find_all('a', style="display:block")
i = 0
print ids.count
while (i < len(ids)):
try:
id = ids[i]
vid = id['href'].split('=')
vid = vid[1]
except IndexError:
id = "leer"
try:
name = names[i]
name = name.contents[0]
name = name.encode('iso-8859-1')
except IndexError:
name = ""
data_content = (vid, name)
cursor.execute(data_query, data_content)
emp_no = cursor.lastrowid
i = i + 1
答案 0 :(得分:0)
我的评论似乎就是答案。刚试过它:
for vid, name in zip(ids, names):
vid = vid['href'].split('=')
vid = vid[1]
name = name.contents[0]
name = name.encode('iso-8859-1')
data_content = (vid, name)
cursor.execute(data_query, data_content)
emp_no = cursor.lastrowid
有关更常见的表单,请参阅:How can I iterate through two lists in parallel?
抱歉复制。 如果有人可以在答案中添加一些内容,请随意。