我正在尝试抓取多个具有相同HTML结构的网站,并将内容写入JSON文件。每个URL的结果在终端中打印出来,但只有列出的最后一个URL中的内容被写入JSON文件。我一直无法找到解决方案。这是我的代码
from urllib.request import urlopen
from bs4 import BeautifulSoup as soup
import json
urls = ['https://scholarworks.gvsu.edu/books/', 'https://pdxscholar.library.pdx.edu/pdxopen/', 'https://oer.galileo.usg.edu/all-textbooks/index.html', 'https://oer.galileo.usg.edu/all-textbooks/index.2.html', 'https://digitalcommons.trinity.edu/textbooks/']
#scrape elements
for url in urls:
uClient = urlopen(url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")
containers = page_soup.findAll("div",{"class":"content_block"})
source = page_soup.find("div",{"id":"series-header"})
data = []
for container in containers:
item = {}
item['type'] = "Textbook"
item['title'] = container.h2.text
item['author'] = container.p.text
item['link'] = container.a["href"]
item['source'] = source.h2.text
data.append(item) # add the item to the list
print(container.h2.text)
with open("./json/multiple.json", "w") as writeJSON:
json.dump(data, writeJSON, ensure_ascii=False)
答案 0 :(得分:0)
您正在使用每个网址重新初始化<script src="https://cdn.jsdelivr.net/npm/vue"></script>
<h5 align="center"> Chat Application </h5>
<div id="app" style:"align=center">
<p> {{ this.owner }} </br>
{{ this.msg }} </p>
<appData></appData>
</div>
<script>
Vue.component("appData", {
data:{
postMessage: function(){
return {
owner,
msg
}
}
},
template: `
<input type="text" id="txtOwner" v-model="owner">
<input type="text" id="txtMsg" v-model="msg">
<button @click="postMessage"></button>`
}),
new Vue({
el: '#app',
components : {
appData
}
})
</script>
。我想你想把初始化放在你的外循环之上:
data