我想提取存在的给定文本,可以在网页中的任何位置使用,而不使用CssSelector,Xpath,ClassName等...
我有以下代码:
<script>
export default{
data() {
return {
movies:[],
day:moment()
}
},
mounted(){
axios.get("/fa")
.then(response => this.movies = response.data);
},
methods:{
filteration(movie){
return movie.filter(this.time);
},
time(movie){
return moment(movie.time_session.time).hour() > 20;
// return moment(movie.time_session.time).isSame(this.day,'day');
}
}
}
</script>
之前我使用此代码执行相同的文本提取过程,但使用bs4并且成功运行。
keyword = raw_input("Please Enter The Keyword to Search : ")
from selenium import webdriver
driver = webdriver.Chrome()#path is already setuped
driver.get(url)
driver.implicitly_wait(5)
# Not providing Expected output
# dataa = driver.find_elements_by_xpath("//*[contains(text(), "+keyword+")]")
dataa = driver.page_source
driver.quit()
是否有任何方法,以便我只能使用关键字?
提取段落或描述答案 0 :(得分:0)
那么如果你使用&#39; goose&#39;提取页面上的所有文字怎么办?模块,然后迭代所有内容并在给定的句子中找到关键字,如下所示:
from goose import Goose
keyword = 'I can only extract paragraphs'
g = Goose(config={'enable_image_fetching':False})
article = g.extract(url='https://stackoverflow.com/questions/44444456/extract-the-provided-text-whole-paragraph-where-ever-the-text-present-in-webpa')
text = article.cleaned_text
_sent = [sent for sent in text.split('\n') if keyword in sent]
print _sent
#[u'is there any method ?? so that ""I can only extract paragraphs or Description using the keyword""']
更新:需要额外的模块:pyteaser。函数根据提供的关键字
的评分返回前5个sentsfrom pyteaser import Summarize
from goose import Goose
def teaser(title,text):
summaries = Summarize(title,text)
return summaries
g = Goose(config={'enable_image_fetching':False})
article = g.extract(url='http://en.wikipedia.org/wiki/Rahul_Dravid')
text = article.cleaned_text
print teaser('Dravid',text)