我想分析的内容如下所示:
<tr>
<td style="border:1px #DDD solid; border-collapse:collapse; text-align:left; padding:8px 8px 8px 8px;">1470-160X</td>
<td style="border:1px #DDD solid; border-collapse:collapse; text-align:left; padding:8px 8px 8px 8px;"><a href="http://www.letpub.com.cn/index.php?journalid=2408&page=journalapp&view=detail" style="color:#0099FF; font-size:12px; font-weight:bold; text-decoration:underline;" target="_blank">ECOLOGICAL INDICATORS</a><br><br><font color="grey">ECOL INDIC</font></br></br></td>
<td style="border:1px #DDD solid; border-collapse:collapse; text-align:left; padding:8px 8px 8px 8px;">3.190</td>
<td style="border:1px #DDD solid; border-collapse:collapse; text-align:left; padding:8px 8px 8px 8px;">2区</td>
<td style="border:1px #DDD solid; border-collapse:collapse; text-align:left; padding:8px 8px 8px 8px;">环境科学与生态学</td>
<td style="border:1px #DDD solid; border-collapse:collapse; text-align:left; padding:8px 8px 8px 8px;">环境科学</td>
<td style="border:1px #DDD solid; border-collapse:collapse; text-align:left; padding:8px 8px 8px 8px;">SCIE</td>
<td style="border:1px #DDD solid; border-collapse:collapse; text-align:left; padding:8px 8px 8px 8px;">No</td>
<td style="border:1px #DDD solid; border-collapse:collapse; text-align:left; padding:8px 8px 8px 8px;">容易</td>
<td style="border:1px #DDD solid; border-collapse:collapse; text-align:left; padding:8px 8px 8px 8px;">约3.0个月</td>
<td style="border:1px #DDD solid; border-collapse:collapse; text-align:left; padding:8px 8px 8px 8px;"><a href="http://www.letpub.com.cn/index.php?page=journalapp&view=detail&journalid=2408&xuanxiangk_id=2#xuanxk_3" style="color:#0099FF; text-decoration:underline;" target="_blank">文章</a>
<td style="border:1px #DDD solid; border-collapse:collapse; text-align:left; padding:8px 8px 8px 8px;">33977</td></td>
</tr>
您还可以看到屏幕截图(演示):
我的代码段如下:
journal_ISSN = []
journal_name = []
journal_affecting_factors = []
journal_JCR_zone = []
journal_parent_class = []
journal_sub_class = []
journal_SCI = []
journal_acception = []
journal_period = []
for i in range(2, 3):
url = "http://www.letpub.com.cn/index.php?page=journalapp&view=search&searchname=&searchissn=&searchfield=&searchimpactlow=&searchimpacthigh=&searchimpacttrend=&searchscitype=&searchcategory1=%E7%8E%AF%E5%A2%83%E7%A7%91%E5%AD%A6%E4%B8%8E%E7%94%9F%E6%80%81%E5%AD%A6&searchcategory2=%E7%8E%AF%E5%A2%83%E7%A7%91%E5%AD%A6&searchjcrkind=&searchopenaccess=&searchsort=relevance&searchsortorder=desc¤tsearchpage="
resp = urlopen('%s%d%s' % (url, i,
soup = BeautifulSoup(resp, "html.parser")
journal_table = soup.findAll("table", {"class": "table_yjfx"})
# rows = journal_table.find_All("tr")[1:]
print(journal_table)
for line in journal_table:
rows = line.findAll('tr')
for single_line in rows[1:10]:
col = single_line.findAll('td')
journal_ISSN.append(col[0].string.strip())
journal_name.append(col[1].string.strip())
journal_affecting_factors.append(col[2].string.strip())
journal_JCR_zone.append(col[3].string.strip())
journal_parent_class.append(col[4].string.strip())
journal_sub_class.append(col[5].string.strip())
journal_SCI.append(col[6].string.strip())
journal_acception.append(col[7].string.strip())
journal_period.append(col[8].string.strip())
无法完成任务。有人可以帮忙吗?
我收到了错误消息:
AttributeError:&#39; ResultSet&#39; object没有属性&#39; string&#39;
答案 0 :(得分:0)
您可以使用text
代替string
并更改for循环,以下是完整代码供您参考:
from bs4 import BeautifulSoup
from urllib2 import urlopen
journal_ISSN = []
journal_name = []
journal_affecting_factors = []
journal_JCR_zone = []
journal_parent_class = []
journal_sub_class = []
journal_SCI = []
journal_acception = []
journal_period = []
url = "http://www.letpub.com.cn/index.php?page=journalapp&view=search&searchname=&searchissn=&searchfield=&searchimpactlow=&searchimpacthigh=&searchimpacttrend=&searchscitype=&searchcategory1=%E7%8E%AF%E5%A2%83%E7%A7%91%E5%AD%A6%E4%B8%8E%E7%94%9F%E6%80%81%E5%AD%A6&searchcategory2=%E7%8E%AF%E5%A2%83%E7%A7%91%E5%AD%A6&searchjcrkind=&searchopenaccess=&searchsort=relevance&searchsortorder=desc¤tsearchpage=2"
resp = urlopen(url)
soup = BeautifulSoup(resp.read().decode('utf-8'), "html.parser") #decode to utf-8
journal_table = soup.find("table", {"class": "table_yjfx"})
rows = journal_table.find_all('tr')[2:-1] #filter to get only table data
for row in rows:
col = row.find_all('td')
journal_ISSN.append(col[0].text.strip())
journal_name.append(col[1].text.strip())
journal_affecting_factors.append(col[2].text.strip())
journal_JCR_zone.append(col[3].text.strip())
journal_parent_class.append(col[4].text.strip())
journal_sub_class.append(col[5].text.strip())
journal_SCI.append(col[6].text.strip())
journal_acception.append(col[7].text.strip())
journal_period.append(col[8].text.strip())
然后您可以打印journal_JCR_zone[0]
和journal_parent_class[0]
print journal_JCR_zone[0]
print journal_parent_class[0]
输出:
4区
环境科学与生态学
或者您可以将结果写入这样的文件:
with open('chinesechar.txt','wb') as outf:
outf.write(journal_sub_class[0].encode("utf-8"))
将环境科学
写入文件chinesechar.txt