我用来提取其中一本书的评论的脚本是:
URL: www.goodreads.com/book/show/2657.To_Kill_a_Mockingbird
from selenium import webdriver
import time
driver = webdriver.Chrome()
time.sleep(3)
driver.get('https://www.goodreads.com/book/show/2657.To_Kill_a_Mockingbird')
time.sleep(5)
reviews = driver.find_elements_by_css_selector("div.reviewText")
for r in reviews:
spanText = r.find_element_by_css_selector("span.readable:nth-child(2)").text
print("Span text:", spanText)
我遇到的问题是我无法从 div.reviewText> span 中提取整个文本,因为在 div> span 中有两个嵌套 spans (一个跨度)包含少量文本(要获取全文,请单击 ... more 链接)不完整,而div中的第二个跨度包含全文,所以我想让文本frm第二个跨度。有人可以帮我吗?
HTML(或者您可以通过上面的链接访问该网站)
<div class="reviewText stacked">
<span id="reviewTextContainer35272288" class="readable">
<span id="freeTextContainer13558188749606170457">If I could give this no stars, I would. This is possibly one of my least favorite books in the world, one that I would happily take off of shelves and stow in dark corners where no one would ever have to read it again.
<br>
<br>I think that To Kill A Mockingbird has such a prominent place in (American) culture because it is a naive, idealistic piece of writing in which naivete and idealism are ultimately rewarded. It's a saccharine, rose-tinted eulogy for the nineteen thirties from an orator who comes not
</span>
<span id="freeText13558188749606170457" style="display:none">If I could give this no stars, I would. This is possibly one of my least favorite books in the world, one that I would happily take off of shelves and stow in dark corners where no one would ever have to read it again.
<br>
<br>I think that To Kill A Mockingbird has such a prominent place in (American) culture because it is a naive, idealistic piece of writing in which naivete and idealism are ultimately rewarded. It's a saccharine, rose-tinted eulogy for the nineteen thirties from an orator who comes not to bury, but to praise. Written in the late fifties, TKAM is free of the social changes and conventions that people at the time were (and are, to some extent) still grating at. The primary dividing line in TKAM is not one of race, but is rather one of good people versus bad people -- something that, of course, Atticus and the children can discern effortlessly.
<br>
<br>The characters are one dimensional. Calpurnia is the Negro who knows her place and loves the children; Atticus is a good father, wise and patient; Tom Robinson is the innocent wronged; Boo is the kind eccentric; Jem is the little boy who grows up; Scout is the precocious, knowledgable child. They have no identity outside of these roles. The children have no guile, no shrewdness--there is none of the delightfully subversive slyness that real children have, the sneakiness that will ultimately allow them to grow up. Jem and Scout will be children forever, existing in a world of black and white in which lacking knowledge allows people to see the truth in all of its simple, nuanceless glory.
<br>
<br>I think that's why people find it soothing: TKAM privileges, celebrates, even, the child's point of view. Other YA classics--Huckleberry Finn; Catcher in the Rye; A Wrinkle in Time; The Day No Pigs Would Die; Are You There, God? It's Me, Margaret; Bridge to Terabithia--feature protagonists who are, if not actively fighting to become adults, at least fighting to find themselves as people. There is an active struggle throughout each of those books to make sense of the world, to define the world as something larger than oneself, as something that the protagonist can somehow be a part of. To Kill A Mockingbird has no struggle to become part of the world--in it, the children *are* the world, and everything else is just only relevant in as much as it affects them. There's no struggle to make sense of things, because to them, it already makes sense; there's no struggle to be a part of something, because they're already a part of everything. There's no sense of maturation--their world changes, but it leaves them, in many ways, unchanged, and because of that, it fails as a story for me. The whole point of a coming of age story--which is what TKAM is generally billed as--is that the characters come of age, or at least mature in some fashion, and it just doesn't happen.
<br>
<br>All thematic issues aside, I think that the writing is very, er, uneven, shall we say? Overwhelmingly episodic, not terribly consistent, and largely as dimensionless as the characters.
<br>
</span>
<a data-text-id="13558188749606170457" href="#" onclick="swapContent($(this));; return false;">...more</a>
</span>
</div>
答案 0 :(得分:0)
第二个跨度已隐藏,因此无法使用text
属性获得其内容。
您需要尝试
spanText = r.find_elements_by_css_selector("span.readable > span")[-1].get_attribute('textContent')
获取隐藏元素的内容
答案 1 :(得分:0)
使用get_attribute()
提取隐藏的内容,而您不需要不必要的睡眠
driver = webdriver.Chrome()
driver.get('https://www.goodreads.com/book/show/2657.To_Kill_a_Mockingbird')
reviews = driver.find_elements_by_css_selector("span.readable span:nth-child(2)")
for r in reviews:
spanText = r.get_attribute('textContent')
print("Span text:", spanText)