我是Python初学者,遇到了一个NLP项目。
这是我的代码:
doc1 = "I am a dog and I like biscut"
doc2 = "I am a cat"
doc3 = "I like to drink milk"
doc4 = "I am a bird and I fly to the sky"
doc5 = "I am an elephant and I want to sleep"
docs = [doc1.split(' '), doc2.split(' '), doc3.split(' ')]
docs2 = [doc4.split(' '), doc5.split(' ')]
docs_all = doc1.split(' ') + doc2.split(' ') + doc3.split(' ') +
doc4.split(' ') + doc5.split(' ')
并获得一组docs_all:
print(list(enumerate(set(docs_all))))
[(0, 'a'), (1, 'the'), (2, 'drink'), (3, 'elephant'), (4, 'dog'), (5,
'biscut'), (6, 'cat'), (7, 'bird'), (8, 'an'), (9, 'milk'), (10, 'want'),
(11, 'am'), (12, 'I'), (13, 'and'), (14, 'to'), (15, 'sky'), (16, 'sleep'),
(17, 'like'), (18, 'fly')]
docs和docs 2的参考矩阵为:
setdocs = [(0, 0.44), (1, 0.14), (2, 0.22), (3, 0.113), (4, 0.44), (5,
0.15), (6, 0.96), (7, 0.77), (8, 0.28), (9, 0.39), (10, 0.111)]
setdocs2 = [(0, 0.55), (1, 0.13), (2, 0.52), (3, 0.33), (4, 0.114),
(5,0.995),(6, 0.16), (7, 0.97), (8, 0.118), (9, 0.14), (10, 0.88), (11,
0.166), (12, 0.85)]
元组中的第一个值是docs和docs2的单词数,来自
refdocs2 = list(enumerate(set(doc4.split(' ') + doc5.split(' '))))
refdocs = list(enumerate(set(doc1.split(' ') + doc2.split(' ') +
doc3.split(' '))))
print(refdocs)
print(refdocs2)
[(0, 'a'), (1, 'drink'), (2, 'dog'), (3, 'biscut'), (4, 'cat'), (5, 'milk'),
(6, 'am'), (7, 'I'), (8, 'and'), (9, 'to'), (10, 'like')]
[(0, 'a'), (1, 'the'), (2, 'elephant'), (3, 'bird'), (4, 'an'), (5, 'want'),
(6, 'sleep'), (7, 'am'), (8, 'I'), (9, 'and'), (10, 'to'), (11, 'sky'), (12,
'fly')]
我想得到一个关于以下内容的矩阵:
finaldocs = [[0.44, 0, 0, 0, 0.22, 0.113, 0, 0, 0, 0, 0, 0.96, 0.77, 0.28,
0, 0, 0, 0.111, 0],
[0.44, 0, 0, 0, 0, 0, 0.44, 0, 0, 0, 0, 0.96, 0.77, 0, 0, 0, 0,
0, 0],
[0, 0, 0.14, 0, 0, 0, 0, 0, 0, 0.15, 0, 0, 0.77, 0, 0.39,
0, 0, 0.111, 0]]
finaldocs2 = [[0.55, 0.13, 0, 0, 0, 0, 0, 0.33, 0, 0, 0.97, 0.118, 0.14,
0.88, 0.166, 0, 0, 0.85],[0, 0.13, 0, 0.52, 0, 0, 0, 0, 0.114,
0.995, 0.97, 0.118, 0.14, 0.88, 0, 0.16, 0, 0]]
setdocs
和setdocs2
元组中的第二个值是我要取出的值。
finaldocs [0]至finaldocs [3]引用doc1至doc3并获取setdocs
中元组的第二个值,其序号为list(enumerate(set(docs_all)))
例如,在list(enumerate(set(docs_all)))
的0、6、11、12值中出现doc2 =“我是猫”。 “ I”,“ am”,“ a”,“ cat”出现在refdocs
的0,4,6,7值中,并从setdocs
获得元组中的第二个值以创建{{1} }
finaldocs[2]
我的尝试:
[0.44, 0, 0, 0, 0, 0, 0.44, 0, 0, 0, 0, 0.96, 0.77, 0, 0, 0, 0, 0, 0]
它确实失败了。
如何通过Python编码获得dll = [np.arange(19),np.arange(19),np.arange(19)]
for i in dll:
for ii in i:
for m in list(enumerate(set(docs_all))):
for mm,nn in m:
for t in refdocs:
for tt,ll in t:
for p in setdocs:
for pp,oo in p:
if nn in ll:
i.replace(i, oo)
?