将<b> ... </b>格式应用于文本中出现的wordlist

时间:2015-06-10 23:48:05

标签: python

给定一个单词列表,我想强调(使用<b> ... </b>标签)字符串中的这些单词。不使用正则表达式。

例如,我有:

list_of_words = ["python", "R", "Julia" ...]
a_Speech = "A paragraph about programming languages  ......R is good for statisticians . Python is good for programmers . ....."

输出应为

a_Speech = "A paragraph about programming languages  ......<b>R</b> is good for statisticians . <b>Python</b> is good for programmers . ....."

我尝试过类似的事情:

def right_shift(astr, index, n):
    # shift by n = 3,n = 4  characters 

def function_name(a_speech): 

    for x in list_of_words: 
        if x in a_speech: 
             loc = a_speech.index(x) 
             right_shift(a_speech, loc, 3)
             a_speech[loc] = "<b>"

             right_shift(a_speech, loc+len(x), 4)          
             a_speech[loc+len(x)] = "</b>

    return a_speech

2 个答案:

答案 0 :(得分:0)

这完全有效。 你需要在空格上然后在句点上拆分 a_Speech ,所以我们编写一个复合拆分函数is_split_char()然后将它传递给itertools.groupby(),这是一个非常整洁的迭代器。

bold_words = set(word.lower() for word in ["python", "R", "Julia"])
  # faster to use a set than a list to test membership

import itertools

def bold_specific_words(bold_words, splitchars, text):
"""Generator to split on specified splitchars, and bold words in wordset, case-insensitive. Don't split contiguous blocks of splitchars. Don't discard the split chars, unlike string.split()."""

  def is_split_char(char, charset=splitchars):
    return char not in charset

  for is_splitchar, chars in itertools.groupby(text, is_split_char):
     word = ''.join(chars) # reform our word from the sub-iterators
     if word.lower() in bold_words:
         yield '<b>' + word + '</b>'
     else:
         yield word

>>> ''.join(word for word in bold_specific_words(bold_words, ' .', a_Speech))
'A paragraph about programming languages  ......<b>R</b> is good for statisticians . <b>Python</b> is good for programmers . .....'

答案 1 :(得分:0)

这样的事情可能有用,创建一个带有详细信息的子串列表,并在最后附加它们:

def function_name(a_speech): 

    loc = 0
    substrings = []
    for word in list_of_words:
        if word in a_speech[loc:]:
             currentloc = loc
             loc = a_speech.index(word, start=currentloc)
             substrings.append(a_speech[currentloc:loc])
             substrings.append("<b>")
             substrings.append(word)
             substrings.append("</b>")
             loc += 3 + len(word) + 4

    return "".join(substrings)

(注意:未经测试。您可能需要弄清楚最后的一些细节)