我如何使用NLTK模块同时写出名词的单数和复数形式,或者告诉它在搜索单词的txt文件时不要区分单数和复数?我可以使用NLTK使程序不区分大小写吗?
答案 0 :(得分:8)
您可以使用pattern.en
执行此操作,但不太了解NLTK
>>> from pattern.en import pluralize, singularize
>>>
>>> print pluralize('child') #children
>>> print singularize('wolves') #wolf
请参阅more
答案 1 :(得分:4)
目前编写的模式不支持Python 3(尽管此处正在讨论https://github.com/clips/pattern/issues/62。
TextBlob https://textblob.readthedocs.io建立在模式和NLTK之上,还包括复数功能。它似乎做得很好,虽然它并不完美。请参阅下面的示例代码。
public class PostsListFragment extends Fragment {
private ArrayList<CustomPost> posts;
private Sorting sorting;
private String name;
private RecyclerView recyclerView;
private SubPostsAdapter adapter;
private LinearLayoutManager linearLayoutManager;
private Fetcher fetcher
public PostsListFragment() {
this.posts = new ArrayList<>();
}
public static Fragment newInstance(String name) {
PostsListFragment pf = new PostsListFragment();
pf.name = name;
return pf;
}
@Override
public View onCreateView (LayoutInflater inflater, ViewGroup container, Bundle savedInstanceState) {
recyclerView = (RecyclerView) inflater.inflate(R.layout.posts_list_holder, container, false);
//default sorting
this.sorting = Sorting.HOT;
this.fetcher = new Fetcher(name);
loadItems();
return recyclerView;
}
@Override
public void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setRetainInstance(true);
setHasOptionsMenu(true);
}
private void loadItems() {
if (posts.size() == 0) {
new Thread() {
@Override
public void run() {
posts.addAll(fetcher.fetchPosts(sorting));
new Thread() {
@Override
public void run() {
linearLayoutManager = new LinearLayoutManager(recyclerView.getContext());
adapter = new SubPostsAdapter(posts, getActivity());
getActivity().runOnUiThread(new Runnable() {
@Override
public void run() {
recyclerView.setLayoutManager(linearLayoutManager);
recyclerView.setAdapter(adapter);
}
});
}
}.start();
}
}.start();
}
}
}
答案 2 :(得分:3)
这是使用NLTK进行此操作的一种可能方法。想象一下,您正在搜索“功能”这个词:
from nltk.stem import WordNetLemmatizer
from nltk.tokenize import word_tokenize
wnl = WordNetLemmatizer()
text = "This is a small text, a very small text with no interesting features."
tokens = [token.lower() for token in word_tokenize(text)]
lemmatized_words = [wnl.lemmatize(token) for token in tokens]
'feature' in lemmatized_words
在所有单词中使用str.lower()
处理区分大小写,当然,如果有必要,您还必须将搜索词变为lemmatize。
答案 3 :(得分:0)
回答可能有点迟,但万一有人还在寻找类似的东西:
支持python 2.x和3.x的inflect(也可在github中使用)。 您可以找到给定单词的单数或复数形式:
import inflect
p = inflect.engine()
words = "cat dog child goose pants"
print([p.plural(word) for word in words.split(' ')])
# ['cats', 'dogs', 'children', 'geese', 'pant']
值得注意的是,复数的p.plural
会给你单数
形成。
此外,您可以提供POS(部分语音)标记或提供数字,并且lib确定它是否需要复数或单数:
p.plural('cat', 4) # cats
p.plural('cat', 1) # cat
# but also...
p.plural('cat', 0) # cats