我有以下句子:
sentence = "<s> online auto body <s>"
我想先用3克的单词作为:
('<s>', 'outline', 'auto')
('online', 'auto', 'body')
('auto', 'body', '<s>')
为此,我使用了以下代码:
sentence = '<s> online auto body <s>'
n = 3
word_3grams = ngrams(sentence.split(), n)
for grams in word_3grams:
print(grams)
现在,我希望在每个单词的开头和结尾都加上“#”,如下所示:
('#<s>#','#outline#','#auto#')
('#online#', '#auto#', '#body#')
('#auto#', '#body#', '#<s>#')
但我不知道该怎么办才能得到它。这里的旁注元素是元组,但不介意使用列表。
答案 0 :(得分:1)
你想要一个像功能一样的滑动窗口。
from itertools import islice
sentence = "<s> online auto body <s>"
myList = sentence.split()
myList = ['#' + word + '#' for word in myList]
slidingWindow = [islice(myList, s, None) for s in range(3)]
print(list(zip(*slidingWindow)))
# [('#<s>#', '#online#', '#auto#'), ('#online#', '#auto#', '#body#'), ('#auto#', '#body#', '#<s>#')]
答案 1 :(得分:0)
如果您只想更改字符串,请尝试:
map(lambda s: "#" + s + "#", sentence.split())
答案 2 :(得分:0)
在Python中,元组是不可变的,这意味着它不能被修改。 正如您以某种方式建议的那样,更准确地说,使用列表会更好 list comprehension:
aList = ['auto', 'body', '<s>']
newList = ['#' + item + '#' for item in aList]
print (newList)
# ['#auto#', '#body#', '#<s>#']
答案 3 :(得分:0)
您可以使用列表推导和format功能
来执行此操作word_3grams = [('<s>', 'outline', 'auto'),
('online', 'auto', 'body'),
('auto', 'body', '<s>')]
for grams in word_3grams:
print ["{pad}{data}{pad}".format(pad='#', data=s) for s in grams]
['#<s>#', '#outline#', '#auto#']
['#online#', '#auto#', '#body#']
['#auto#', '#body#', '#<s>#']
答案 4 :(得分:0)
从一开始就是一个解决方案:
sentence = "<s> online auto body <s>"
n = 3
# Split the sentence into words and append the '#' symbol.
words = tuple(map(lambda w: '#'+w+'#', sentence.split()))
# Create a list of elements consisting of three consecutive words.
splits = [words[i:i+n] for i in range(len(words)-(n-1))]
#Print results.
for elem in splits:
print(elem)
输出:
('#<s>#', '#online#', '#auto#')
('#online#', '#auto#', '#body#')
('#auto#', '#body#', '#<s>#')