有人可以告诉我如何考虑给定单词周围的单词吗? 例如:如果我们有句话:“今天天气晴朗,我们喜欢散步。” 然后,如果窗口大小为5,我想得到以下内容:
等等。 考虑到双胞胎是没有问题的:
bigrams = [p for s in corpus_lemm for p in nltk.bigrams(w for w in s)] #take bigrams inside of each sentence
但是我如何考虑给定窗口大小的单词?
非常感谢您的帮助!
答案 0 :(得分:0)
抱歉,我对Python没有多少控制权但是在JS中可以完成以下工作。希望你可以将它实现为Python。
var str = "Today the weather is fine and we love to walk.",
arr = str.split(/\s+/),
win = 5,
result = arr.map((w,i,a) => Array(win).fill()
.map((e,j) => a[i + j + -1 * Math.floor(win/2)])
.reduce((p,c) => p ? c ? p + " " + c
: p
: c));
console.log(result);

根据你的评论...虽然坚持使用相同的算法,但我可以扩展我的答案如下。
var arr = [1,2,3,4,5,6,7,8],
win = 5,
result = arr.map((_,i,a) => Array(win).fill()
.map((e,j) => a[i + j + -1 * Math.floor(win/2)])
.reduce((p,c) => p ? c ? [].concat(p,c)
: p
: c ? c
: undefined));
console.log(JSON.stringify(result));

答案 1 :(得分:0)
我不确定我是否了解窗口,但似乎是您想要的输出。
s = "Today the weather is fine and we love to walk"
words = s.split()
win_len = 5
half_win = win_len // 2
print "\n".join(words[:half_win])
for i in range(len(words) - win_len + 1):
window = words[i:i+win_len]
# print " ".join(window)
print window[len(window) // 2]
print "\n".join(words[-half_win:])
输出
Today
the
weather
is
fine
and
we
love
to
walk
答案 2 :(得分:0)
您可以使用list.index
并列出切片来检索所需的单词。
def words(text, search, window):
words = s.split()
i = words.index(search)
low = i - window // 2
high = low + window
low = max(low, 0)
return words[low:high]