我有一个.csv文件,只有一列包含1000行。每行包含一个单词(bag-of-words模型)。现在我想找出每个单词是否是名词,动词,形容词等。我想有第二列(有1000行),每个包含属于列中单词的信息(名词或动词) 1。
我已经将csv导入了R.但我现在该怎么办?
[这是一个例子。我有这些话,我想知道它是否是一个名词动词等] [
答案 0 :(得分:1)
有多个选项,但您可以使用udpipe
。
terms <- data.frame(term = c("unit", "determine", "generate", "digital", "mount", "control", "position", "input", "output", "user"),
stringsAsFactors = FALSE)
library(udpipe)
# check if model is already downloaded.
if (file.exists("english-ud-2.0-170801.udpipe"))
ud_model <- udpipe_load_model(file = "english-ud-2.0-170801.udpipe") else {
ud_model <- udpipe_download_model(language = "english")
ud_model <- udpipe_load_model(ud_model$file_model)
}
# no need for parsing as this data only contains single words.
t <- udpipe_annotate(ud_model, terms$term, parser = "none")
t <- as.data.frame(t)
terms$POSTAG <- t$upos
terms
term POSTAG
1 unit NOUN
2 determine VERB
3 generate VERB
4 digital ADJ
5 mount NOUN
6 control NOUN
7 position NOUN
8 input NOUN
9 output NOUN
10 user NOUN
答案 1 :(得分:0)
您可以使用spacyr
这是Python包spaCy
的R包装器。
注意:您必须
library(spacyr)
spacy_initialize(python_executable = '/path/to/python')
然后为您的条款:
Terms <- data.frame(Term = c("unit",
"determine",
"generate",
"digital",
"mount",
"control",
"position",
"input",
"output",
"user"), stringsAsFactors = FALSE)
使用功能spacy_parse()
标记您的字词并将其添加到您的数据框中:
Terms$POS_TAG <- spacy_parse(Terms$Term)$pos
结果是:
Term POS_TAG
1 unit NOUN
2 determine VERB
3 generate VERB
4 digital ADJ
5 mount VERB
6 control NOUN
7 position NOUN
8 input NOUN
9 output NOUN
10 user NOUN