如何使用python逐行标记和分块标记的句子

时间:2014-03-24 09:50:47

标签: python machine-learning nlp tokenize

我是语言学家,并希望使用python逐行标记csv文档中的句子,并告诉标记和标记中的标记位置(B-beginning或I-inside),就像下面的示例一样。

"id", "sentence"
"1", "<person>Claire</person>lived in<location>London UK</location>for<time>2 years</time>"
"2", "<location>UK</location> is in<location>Europe</location>"
 ...........
 ...........


 dataframe = pd.read_csv(document)
 sentences = dataframe['sentence']
 for line in sentences :
     #print token position tag

 >> Claire  B-Per  person 
    lived   null   null   
    in      null   null
    London  B-Loc  location
    UK      I-Loc  location
    for     null   null
    2       B-Tim   time
    years   I-Tim   time 

    UK      B-Loc  location
    is      null   null                
    in      null   null
    Europe  B-Loc  location 

0 个答案:

没有答案