Question

我不知道如何解决这个问题。

我有3个列表，其中包含一个单词，一个标记以及一个出现在文档中的数字：

v1 = [['be', 'VSIS3S0', 1], ['scott', 'NP00000', 2], ['north', 'NCMS000', 1], ['revolution', 'NP00000', 1], ['name', 'VMP00SM', 1]]
v2 = [['mechanic', 'NCMS000', 1], ['be', 'VSIS3S0', 1], ['tool', 'AQ0CS0', 1], ['sam', 'NP00000', 1], ['frida', 'NP00000', 1]]
v3 = [['be', 'VSIP3S0', 1], ['scott', 'NP00000', 1], ['who', 'NP00000', 1]]

如何构建一个接收这些列表的函数来比较每个单词，以便例如be中的单词v1出现在三个列表中一次，在这种情况下附加到结果列表中(1 * log(3/3))，其中1 - ＆gt;最大出现值（子列表的第3个元素），log numerator 3 - ＆gt;常数，对数分母3 - ＆gt;因为该字出现在v1，v2和v3。

接下来我们有scott - ＆gt;在这种情况下，附加到结果列表(2 * log(3/2))，2 - ＆gt;最大字的出现值，log numerator 3 - ＆gt;常数，对数分母2 - ＆gt;因为“scott”一词出现在v1和v2上。

接下来我们有north - ＆gt;在这种情况下，附加到结果列表(1 * log(3/1))，1 - ＆gt;最大字的出现值，log numerator 3 - ＆gt;常数，对数分母1 - ＆gt;因为'north'这个词只出现v1。

接下来我们有revolution - ＆gt;在这种情况下，附加到结果列表(1 * log(3/1))，1 - ＆gt;最大字的出现值，log numerator 3 - ＆gt;常数，对数分母1 - ＆gt;因为'north'这个词只出现v1。

接下来我们有name - ＆gt;在这种情况下，附加到结果列表(1 * log(3/1))，1 - ＆gt;最大字的出现值，log numerator 3 - ＆gt;常数，对数分母1 - ＆gt;因为“名称”一词只显示v1。

此外，我们必须通过将v2，mechanic，be等与其他字词进行比较，对tool执行相同操作，计算出现的最大值和根据{{1}}和w/ log(3/?)中是否显示单词，将v1多重化。

这是我对v3的尝试：

v1

返回：def f1(v1, v2, v3): res =[] for e in v1: if e != 0: if e in v2 and e in v3: res.append(0) elif e in v2: res.append(e * math.log(3/2)) else: res.append(e * math.log(3)) return res

这显然不是结果

它应该返回类似的内容：

[0, 2.1972245773362196, 0, 0, 0, 0]

Answer 1

根据你的描述，我得到了

import math
v1 = [['be', 'VSIS3S0', 1], ['scott', 'NP00000', 2], ['north', 'NCMS000', 1], ['revolution', 'NP00000', 1], ['name', 'VMP00SM', 1]]
v2 = [['mechanic', 'NCMS000', 1], ['be', 'VSIS3S0', 1], ['tool', 'AQ0CS0', 1], ['sam', 'NP00000', 1], ['frida', 'NP00000', 1]]
v3 = [['be', 'VSIP3S0', 1], ['scott', 'NP00000', 1], ['who', 'NP00000', 1]]

v = [v1,v2,v3]

countdict = {}
for vi in v:
    for e in vi:
        countdict[e[0]] = countdict.get(e[0],0) + 1

scoredict = {}
for vi in v:
    for e in vi:
        scoredict[e[0]] = scoredict.get(e[0],0) + (e[2] * math.log10(3.0/countdict[e[0]]))

print scoredict

我将输出保存为dict，即：

{'be': 0.0, 'revolution': 0.47712125471966244, 'north': 0.47712125471966244, 'name': 0.47712125471966244, 'sam': 0.47712125471966244, 'tool': 0.47712125471966244, 'who': 0.47712125471966244, 'scott': 0.5282737771670437, 'mechanic': 0.47712125471966244, 'frida': 0.47712125471966244}

比较字符串和计算事件

1 个答案: