将HTML中的短语转换为Python链接

时间:2015-08-18 04:28:43

标签: python html regex replace

我在网站上有内容,其中某些关键字和关键字应与其他内容相关联。我不想手动将链接插入到内容中。在Python中执行此操作的最佳方法是什么,最好不使用任何DOM库?

例如,我有这样的文字:

Key-phrase: awesome method

...And this can be accomplished with the <a href="/path/to/awesome-method">Awesome Method</a>. Blah blah blah....

这是所需的输出:

{{1}}

我列出了这些关键短语和相应的网址。这些短语可以在内容中出现,但在关键短语定义中将全部小写。

目前我正在使用string-find-replace和case-changed单词的组合。这是非常低效的。

3 个答案:

答案 0 :(得分:1)

这样的东西
for keyphrase, url in links:
    content = re.sub('(%s)' % keyphrase, r'<a href="%s">\1</a>' % url, content, flags=re.IGNORECASE)

例如,在你的例子中,你可以做到

import re

content = "...And this can be accomplished with the Awesome Method. Blah blah blah...."
links = [('awesome method', '/path/to/awesome-method')]

for keyphrase, url in links:
    content = re.sub('(%s)' % keyphrase, r'<a href="%s">\1</a>' % url, content, flags=re.IGNORECASE)

# content:
# '...And this can be accomplished with the <a href="/path/to/awesome-method">Awesome Method</a>. Blah blah blah....'

答案 1 :(得分:1)

您可以迭代文本中的位置,并使用基本字符串操作构建新文本:

import re

text = """And this can be accomplished with the Awesome Method. Blah blah blah"""

keyphrases = [
    ('awesome method', 'http://awesome.com/'),
    ('blah', 'http://blah.com')
  ] 

new_parts = []

pos = 0
while pos < len(text):
  replaced = False
  for phrase, url in keyphrases:
    substring = text[pos:pos+len(phrase)]
    if substring.lower() == phrase.lower():
      new_parts.append('<a href="%s">%s</a>' % (url, substring))
      pos += len(substring)
      replaced = True
      break
  if not replaced:
    new_parts.append(text[pos])
    pos += 1

new_text = ''.join(new_parts)
print(new_text)

答案 2 :(得分:0)

查看anchorman模块 - 将文字转换为超文本。

import anchorman

text = "...And this can be accomplished with the Awesome Method. Blah blah blah...."
links = [{'awesome method': {'value': '/path/to/awesome-method'}}]
markup_format = {
    'tag': 'a',
    'value_key': 'href',
    'attributes': [
        ('class', 'anups')
    ],
    'case_sensitive': False
}

a = anchorman.add(text, links, markup_format=markup_format)
print a
...And this can be accomplished with the <a href="/path/to/awesome-method"
class="anups">Awesome Method</a>. Blah blah blah....