使用python从其后代中提取主要单词

时间:2016-03-16 17:12:31

标签: python nltk

想知道是否有办法从后代中提取主要词,

例如:

recruitment -> recruit
recruiter -> recruit
recruited -> recruit

我使用wordnet lemmatizer得到了最后一个,就像这样:

from nltk.stem.wordnet import WordNetLemmatizer
lmtzr = WordNetLemmatizer()
lmtzr.lemmatize('recruited', 'v')

似乎找不到其他人的解决方案,是否有一个库或者我应该编写一个函数。

2 个答案:

答案 0 :(得分:2)

我认为你在谈论stemming

http://www.nltk.org/api/nltk.stem.html

  

用于从单词中删除形态词缀的处理界面。这个过程被称为词干。

from nltk.stem.lancaster import LancasterStemmer
st = LancasterStemmer()
st.stem('recruitment')
st.stem('recruiter')
st.stem('recruited')

答案 1 :(得分:2)

从nltk

尝试LancasterStemmer
<!DOCTYPE html>
<html>
<style>

body {
    font-family: helvetica, arial, sans-serif;
    font-weight: bold;
    color: #fff;
    background: #000;
    margin: 2vw 10vw 2vw 10vw;
}

span {
    white-space: nowrap;    /* Specify that the text in element will never wrap */
}

.header {
    font-size: 4vw;
    font-weight: bold;  /* bold font */
    color: lightGray;   // color for tdrs/site
}
.header-split      { 
    display:block;
    margin-left:15vw;
}
.header-split span {
    display:block;
    float:right;
    width:30%;
}
#header-ul {
    border-bottom: 3px solid lightgrey
}

#rm {
    text-align: center;
    margin-left:-10vw;
}
.siteTdrs {
    font-size: 3vw;
    font-weight: bold;  /* bold font */
    color: lightGray;   // color for tdrs/site
}
.mission {
    font-size: 4vw;
}

.clock {
    font-size: 8vw;
    text-align: center;
}

.terraColor {
    color: #00ff00; // GREEN
}
.aquaColor {
    color: #00ffff; // CYAN
}
.auraColor {
    color: #ffc800; // ORANGE
}

</style>

<body>
    <div id="wrapper">
      <div id="header-ul"><div class="header header-split">AOS<span>LOS</span></div></div>

      <div id="rm" ><span id="terraRelay" class="siteTdrs">TDE</span>&nbsp;&nbsp;<span class="mission terraColor">TERRA</span></div>
      <div id="terraTime" class="clock terraColor"><span>00:00:00&nbsp;&nbsp;00:00:00</span></div>

      <div id="rm" ><span id="aquaaRelay" class="siteTdrs">SG1</span>&nbsp;&nbsp;<span class="mission aquaColor">AQUA</span></div>
      <div id="aquaTime" class="clock aquaColor"><span>00:00:00&nbsp;&nbsp;00:00:00</span></div>

      <div id="rm" ><span id="auraRelay" class="siteTdrs">TDW</span>&nbsp;&nbsp;<span class="mission auraColor">AURA</span></div>
      <div id="auraTime" class="clock auraColor"><span>00:00:00&nbsp;&nbsp;00:00:00</span></div>
    </div>
    <p id="Msgs">this is a error</p>
</body>

</html>