Question

首先，我必须承认我是Python或R的新手。

这里我试图创建一个包含bi-gram / 2-gram列表及其POS标签（NN，VB等等）的文件。这用于轻松识别有意义的二元组及其POS标签组合。

例如：bigram - 'Gross''Profit'有JJ＆amp;的POS标签组合。 NN。但是，二重奏 - '四分之一''有NN＆amp;的POS标签组合。在。有了这个，我可以找到有意义的POS组合。它可能不准确。那样就好。只是想用它来研究。

For Reference please check the section "2-gram Results" in this page.我的要求就是这样。但它是在R中完成的。所以它对我没用。

正如我在Python中遇到的那样，POS标记和bi-gram的创建可以使用NLTK或TextBlob包完成。但我无法找到为Python生成的二元分配POS标签的逻辑。请参阅下面的代码和相关输出。

import nltk
from textblob import TextBlob
from nltk import word_tokenize
from nltk import bigrams

################# Code snippet using TextBlob Package #######################
text1 = """This is an example for using TextBlob Package"""
blobs = TextBlob(text1)             ### Converting str to textblob object
blob_tags = blobs.tags              ### Assigning POS tags to the word blobs
print(blob_tags)
blob_bigrams = blobs.ngrams(n=2)    ### Creating bi-grams from word blobs
print(blob_bigrams)

################# Code snippet using NLTK Package #######################
text2 = """This is an example for using NLTK Package"""
tokens = word_tokenize(text2)       ### Converting str object to List object                        
nltk_tags = nltk.pos_tag(tokens)    ### Assigning POS tags to the word tokens
print(nltk_tags)
nltk_bigrams = bigrams(tokens)      ### Creating bi-grams from word tokens
print(list(nltk_bigrams))

非常感谢任何帮助。提前谢谢。

如何在Python中对Bigrams进行POS标记

0 个答案: