想知道是否有办法从后代中提取主要词,
例如:
recruitment -> recruit
recruiter -> recruit
recruited -> recruit
我使用wordnet lemmatizer得到了最后一个,就像这样:
from nltk.stem.wordnet import WordNetLemmatizer
lmtzr = WordNetLemmatizer()
lmtzr.lemmatize('recruited', 'v')
似乎找不到其他人的解决方案,是否有一个库或者我应该编写一个函数。
答案 0 :(得分:2)
我认为你在谈论stemming
:
http://www.nltk.org/api/nltk.stem.html
用于从单词中删除形态词缀的处理界面。这个过程被称为词干。
from nltk.stem.lancaster import LancasterStemmer
st = LancasterStemmer()
st.stem('recruitment')
st.stem('recruiter')
st.stem('recruited')
答案 1 :(得分:2)
从nltk
尝试LancasterStemmer<!DOCTYPE html>
<html>
<style>
body {
font-family: helvetica, arial, sans-serif;
font-weight: bold;
color: #fff;
background: #000;
margin: 2vw 10vw 2vw 10vw;
}
span {
white-space: nowrap; /* Specify that the text in element will never wrap */
}
.header {
font-size: 4vw;
font-weight: bold; /* bold font */
color: lightGray; // color for tdrs/site
}
.header-split {
display:block;
margin-left:15vw;
}
.header-split span {
display:block;
float:right;
width:30%;
}
#header-ul {
border-bottom: 3px solid lightgrey
}
#rm {
text-align: center;
margin-left:-10vw;
}
.siteTdrs {
font-size: 3vw;
font-weight: bold; /* bold font */
color: lightGray; // color for tdrs/site
}
.mission {
font-size: 4vw;
}
.clock {
font-size: 8vw;
text-align: center;
}
.terraColor {
color: #00ff00; // GREEN
}
.aquaColor {
color: #00ffff; // CYAN
}
.auraColor {
color: #ffc800; // ORANGE
}
</style>
<body>
<div id="wrapper">
<div id="header-ul"><div class="header header-split">AOS<span>LOS</span></div></div>
<div id="rm" ><span id="terraRelay" class="siteTdrs">TDE</span> <span class="mission terraColor">TERRA</span></div>
<div id="terraTime" class="clock terraColor"><span>00:00:00 00:00:00</span></div>
<div id="rm" ><span id="aquaaRelay" class="siteTdrs">SG1</span> <span class="mission aquaColor">AQUA</span></div>
<div id="aquaTime" class="clock aquaColor"><span>00:00:00 00:00:00</span></div>
<div id="rm" ><span id="auraRelay" class="siteTdrs">TDW</span> <span class="mission auraColor">AURA</span></div>
<div id="auraTime" class="clock auraColor"><span>00:00:00 00:00:00</span></div>
</div>
<p id="Msgs">this is a error</p>
</body>
</html>