Question

在网址中： https://teslamotorsclub.com/tmc/threads/tesla-tsla-the-investment-world-the-2019-investors-roundtable.139047/page-2619

帖子＃52365

在获取文本之前，我需要单击“更多”，如何获取其中的文本？有什么方法可以在运行Spider脚本时触发更多扩展以显示整体？

到目前为止，我尝试过的是

info.xpath（“ .// div [@class ='messageContent']”）。extract_first（）。replace（'\ n'，''）

但是我还是听不懂全文

Answer 1

您可能会在末尾看到“单击以展开”文本，但仍会得到完整的报价。您需要避免提取“单击以展开”文本。

例如：

>>> response.xpath('//li[contains(@class, "message")][.//a/text()[.="#52365"]]//*[re:test(@class, "\\bquote\\b")]//text()').getall()
['CCS for model 3 coming', '\nWhile article references Europe, the North American theater will be getting a CCS adapter soon.', '\nSee article for', '\n', '\n', 'Tesla launches $190 CCS adapter for new Model S and Model X, offers retrofits for older vehicles', '\n', '\nMartian High Command', '\n', '\nPS: Text from article.', '\n', '\nUpdate: A Tesla spokesperson told us that they will make sure owners in North America will have access to all “compelling networks”, but they have nothing to announce now.']

Answer 2

正如某人在评论中指出的那样，您无需单击任何内容。如果在浏览器中打开文档检查器，则可以看到所有文本。

您可以使用简单的CSS选择器和for循环来检索所有消息：

如何使用scrapy在扩展更多按钮中提取文本？

2 个答案: