迭代一个同义词列表而不是另一个

时间:2017-08-30 18:15:26

标签: python for-loop nltk wordnet synonym

我有两组wordnet同义词(包含在两个单独的列表对象中,s1和s2),我想从中找到s1到s2中每个synset的最大路径相似度得分,输出长度等于s1的长度。例如,如果s1包含4个同义词集,则输出长度应为4。

我已经尝试了以下代码(到目前为止):



tcl




但它返回以下错误消息

.gitignore

我无法弄清楚代码会发生什么。有人会关注我的代码并分享他/她对for循环的见解吗?非常感谢。

谢谢。

完整的错误回溯就在这里



import numpy as np
import nltk
from nltk.corpus import wordnet as wn
import pandas as pd

#two wordnet synsets (s1, s2)

s1 = [wn.synset('be.v.01'),
 wn.synset('angstrom.n.01'),
 wn.synset('trial.n.02'),
 wn.synset('function.n.01')]

s2 = [wn.synset('use.n.01'),
 wn.synset('function.n.01'),
 wn.synset('check.n.01'),
 wn.synset('code.n.01'),
 wn.synset('inch.n.01'),
 wn.synset('be.v.01'),
 wn.synset('correct.v.01')]
 
# define a function to find the highest path similarity score for each synset in s1 onto s2, with the length of output equal that of s1

ps_list = []
def similarity_score(s1, s2):
    for word1 in s1:
        best = max(wn.path_similarity(word1, word2) for word2 in s2)
        ps_list.append(best)
    return ps_list

ps_list(s1, s2)




[编辑] 我想出了这个临时解决方案:



'>' not supported between instances of 'NoneType' and 'float'




它不优雅,但它有效。不知道是否有人有更好的解决方案或方便的技巧来处理这个?

2 个答案:

答案 0 :(得分:1)

I have no experience with the nltk module, but from reading the docs I can see that path_similarity is a method of whatever object selectInput returns. You are instead treating it as a function.

What you should be doing, is something like this:

library(shiny)

x <- mtcars 

ui <- fluidPage(
  fileInput(inputId = "uploadcsv", "", accept = '.csv'),
  actionButton(inputId = "a", label = "action button"),
      selectInput("select",label = h3("Select box"),choices = "",selected = 1)
)

server <- function(input, output, session) {

  data <- reactive({
    infile <- input$uploadcsv

    if (is.null(infile))
      return(NULL)

    read.csv(infile$datapath, header = TRUE, sep = ",")
  })

  DataToUse <- NULL

  observe(!is.null(input$uploadedcsv),
               DataToUse <- data()
  )

  observeEvent(input$a,
               DataToUse <- x
  )

  observe({
    req(DataToUse)
    if (max(DataToUse$cyl) %% 4 == 0){
      numberofinterval <- max(DataToUse$cyl) %/% 4
    } else {
      numberofinterval <- (max(DataToUse$cyl) %/% 4)+1
    }

    NumPeriod <- seq(0, numberofinterval)

    updateSelectInput(session, inputId = "select",
                             choices = NumPeriod,
                             selected = NumPeriod)
  })

}
shinyApp(ui = ui, server = server)

答案 1 :(得分:1)

我认为错误来自以下内容:

best = max(wn.path_similarity(word1, word2) for word2 in s2)

如果wn.path_similarity(word1,word2)是NoneType,你应该添加一个条件,那么你不能做 max(),例如你可以像这样重写:

best = max([word1.path_similarity(word2) for word2 in s2 if word1.path_similarity(word2) is not None])