Python - 生成单数名词的复数名词

时间:2015-09-04 18:43:29

标签: python nlp

我如何使用NLTK模块同时写出名词的单数和复数形式,或者告诉它在搜索单词的txt文件时不要区分单数和复数?我可以使用NLTK使程序不区分大小写吗?

4 个答案:

答案 0 :(得分:8)

您可以使用pattern.en执行此操作,但不太了解NLTK

>>> from pattern.en import pluralize, singularize
>>>  
>>> print pluralize('child') #children
>>> print singularize('wolves') #wolf

请参阅more

答案 1 :(得分:4)

目前编写的模式不支持Python 3(尽管此处正在讨论https://github.com/clips/pattern/issues/62

TextBlob https://textblob.readthedocs.io建立在模式和NLTK之上,还包括复数功能。它似乎做得很好,虽然它并不完美。请参阅下面的示例代码。

public class PostsListFragment extends Fragment {

private ArrayList<CustomPost> posts;

private Sorting sorting;

private String name;
private RecyclerView recyclerView;
private SubPostsAdapter adapter;
private LinearLayoutManager linearLayoutManager;
private Fetcher fetcher

public PostsListFragment() {
    this.posts = new ArrayList<>();
}

public static Fragment newInstance(String name) {
    PostsListFragment pf = new PostsListFragment();
    pf.name = name;

    return pf;
}

@Override
public View onCreateView (LayoutInflater inflater, ViewGroup container, Bundle savedInstanceState) {
    recyclerView = (RecyclerView) inflater.inflate(R.layout.posts_list_holder, container, false);

    //default sorting
    this.sorting = Sorting.HOT;
    this.fetcher = new Fetcher(name);

    loadItems();

    return recyclerView;
}

@Override
public void onCreate(Bundle savedInstanceState) {
    super.onCreate(savedInstanceState);
    setRetainInstance(true);
    setHasOptionsMenu(true);
}

private void loadItems() {
    if (posts.size() == 0) {
        new Thread() {
            @Override
            public void run() {
                posts.addAll(fetcher.fetchPosts(sorting));

                new Thread() {
                    @Override
                    public void run() {
                        linearLayoutManager = new LinearLayoutManager(recyclerView.getContext());

                        adapter = new SubPostsAdapter(posts, getActivity());

                        getActivity().runOnUiThread(new Runnable() {
                            @Override
                            public void run() {
                                recyclerView.setLayoutManager(linearLayoutManager);
                                recyclerView.setAdapter(adapter);
                            }
                        });

                    }
                }.start();
            }
        }.start();
    }
}
}

答案 2 :(得分:3)

这是使用NLTK进行此操作的一种可能方法。想象一下,您正在搜索“功能”这个词:

from nltk.stem import WordNetLemmatizer
from nltk.tokenize import word_tokenize

wnl = WordNetLemmatizer()
text = "This is a small text, a very small text with no interesting features."
tokens = [token.lower() for token in word_tokenize(text)]
lemmatized_words = [wnl.lemmatize(token) for token in tokens]
'feature' in lemmatized_words

在所有单词中使用str.lower()处理区分大小写,当然,如果有必要,您还必须将搜索词变为lemmatize。

答案 3 :(得分:0)

回答可能有点迟,但万一有人还在寻找类似的东西:

支持python 2.x和3.x的inflect(也可在github中使用)。 您可以找到给定单词的单数或复数形式:

import inflect
p = inflect.engine()

words = "cat dog child goose pants"
print([p.plural(word) for word in words.split(' ')])
# ['cats', 'dogs', 'children', 'geese', 'pant']

值得注意的是,复数的p.plural会给你单数   形成。   此外,您可以提供POS(部分语音)标记或提供数字,并且lib确定它是否需要复数或单数:

p.plural('cat', 4)   # cats
p.plural('cat', 1)   # cat
# but also...
p.plural('cat', 0)   # cats