我正在尝试关注我已经删除的帖子中的链接,以便保存文本。我部分在那里。我只需要调整一些东西,这就是我在这里的原因。而不是不同的帖子,我得到重复。而且不仅如此,它们被括在这样的括号中
[[<div class="article-body" id="image-description"><p>Kanye West premiered
the music video for "Famous" off his "The Life of Pablo" album to a
sold out audience in Los Angeles. The video features nude versions of George W. Bush.
Donald Trump. Anna Wintour. Rihanna. Chris Brown. Taylor Swift.
Kanye West. Kim Kardashian. Ray J. Amber Rose. Caitlyn Jenner.
Bill Cosby (in that order).</p></div>],
和我的代码
def sprinkle():
url_two = 'http://www.example.com'
html = requests.get(url_two, headers=headers)
soup = BeautifulSoup(html.text, 'html5lib')
titles = soup.find_all('div', {'class': 'entry-pos-1'})
def make_soup(url):
the_comments_page = requests.get(url, headers=headers)
soupdata = BeautifulSoup(the_comments_page.text, 'html5lib')
comment = soupdata.find_all('div', {'class': 'article-body'})
return comment
comment_links = [url_two + link.a.get('href') for link in titles]
soup = [make_soup(comments) for comments in comment_links]
# soup = make_soup(comments)
# print(soup)
entries = [{'href': url_two + div.a.get('href'),
'src': url_two + div.a.img.get('data-original'),
'text': div.find('p', 'entry-title').text,
'comments': soup
} for div in titles][:6]
return entries
我觉得我很亲密。这对我来说都是新的。任何帮助都会很棒。
答案 0 :(得分:2)
我想通了
def sprinkle():
url_two = 'http://www.vladtv.com'
html = requests.get(url_two, headers=headers)
soup = BeautifulSoup(html.text, 'html5lib')
titles = soup.find_all('div', {'class': 'entry-pos-1'})
def make_soup(url):
the_comments_page = requests.get(url, headers=headers)
soupdata = BeautifulSoup(the_comments_page.text, 'html5lib')
comment = soupdata.find('div', {'class': 'article-body'})
para = comment.find_all('p')
return para
entries = [{'href': url_two + div.a.get('href'),
'src': url_two + div.a.img.get('data-original'),
'text': div.find('p', 'entry-title').text,
'comments': make_soup(url_two + div.a.get('href'))
} for div in titles][:6]
return entries
我试图从结果中删除括号,但