我需要删除[u'前缀和']后缀围绕着对我很重要的数据。这将被放入一个数据库,从我看到它需要这些额外的字符。我该如何删除它们?我已经在变量上尝试了.replace但它返回了一个错误。
import urllib
import mechanize
from bs4 import BeautifulSoup
import requests
import re
import MySQLdb
import time
db = MySQLdb.connect(
host=" ",
user=" ",
passwd=" ",
db=" ")
inc = 0
# while inc != 3289:
c = db.cursor()
c.execute("""SELECT `symbol` FROM `stocks` LIMIT %s,1""", (inc,))
result = c.fetchall()
result = str(result)
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
br = mechanize.Browser()
br.set_handle_robots(False)
br.addHeaders = [('User-agent',user_agent)]
term = result.replace('((','').replace(',)','').replace("'",'')
url = "http://www.marketwatch.com/investing/stock/"+term
soup = BeautifulSoup(requests.get(url).text)
search = soup.find('p', attrs = {'class':'data bgLast'})
cur = search.findAll(text = True)
search2 = soup.find('span', attrs = {'class':'bgChange'})
diff = search2.findAll(text = True)
print term
print cur
print diff
c.execute("""UPDATE stocks SET cur = %s WHERE symbol = %s""", (cur,term))
c.execute("""UPDATE stocks SET diff = %s WHERE symbol = %s""", (diff,term))
db.commit()
不,谢谢你@jonrsharpe,我找到了答案。在原始代码中,.findAll正在检索结果集。我所要做的就是将它改为一个str,它允许将strip函数传递给它。修订后的代码如下。 :
import urllib
import mechanize
from bs4 import BeautifulSoup
import requests
import re
import MySQLdb
import time
db = MySQLdb.connect(
host=" ",
user=" ",
passwd=" ",
db=" ")
inc = 0
# while inc != 3289:
c = db.cursor()
c.execute("""SELECT `symbol` FROM `stocks` LIMIT %s,1""", (inc,))
result = c.fetchall()
result = str(result)
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
br = mechanize.Browser()
br.set_handle_robots(False)
br.addHeaders = [('User-agent',user_agent)]
term = result.replace('((','').replace(',)','').replace("'",'')
url = "http://www.marketwatch.com/investing/stock/"+term
soup = BeautifulSoup(requests.get(url).text)
search = soup.find('p', attrs = {'class':'data bgLast'})
cur = str(search.findAll(text = True))
search2 = soup.find('span', attrs = {'class':'bgChange'})
diff = str(search2.findAll(text = True))
cur = cur.strip("'[]u")
diff = diff.strip("'[]u")
print term
print cur
print diff
c.execute("""UPDATE stocks SET cur = %s WHERE symbol = %s""", (cur,term))
c.execute("""UPDATE stocks SET diff = %s WHERE symbol = %s""", (diff,term))
db.commit()
答案 0 :(得分:0)
result = str(result)
...
cur = str(search.findAll(text = True))
停止这样做!除了字符串之外还有数据类型!
result
是一个列表清单; search.findAll
为您提供了一个文本节点列表。例如,您可以通过说symbol
来获取第一行的result[0][0]
值;你只需说search.getText()
即可获得元素的文字。
将结构化对象(如列表)串行化为扁平字符串,然后尝试从中挑选出来并不是一种明智的方法。