我想在出现副词时将行拆分为新行。但是,如果连续出现多个副词,那么我只想在最后一个副词之后分成新的一行。
我的数据框示例如下:
0 but well that's alright
1 otherwise however we'll have to
2 okay sure
3 what?
使用副词= ['but','well','otherwise','however'],我希望生成的df看起来像这样:
0 but well
1 that's alright
2 otherwise however
3 we'll have to
2 okay sure
3 what?
答案 0 :(得分:0)
我有一个局部解决方案,也许可以帮上忙。 您可以使用TextBlob软件包。
使用此API,您可以为每个单词分配一个令牌。 here中提供了可能的令牌列表。
问题在于,标记单词并不完美,并且您对副词的定义可能与它们的定义不匹配(例如,but
是API上的coordinating conjunction
,而well
标记出于某种原因是一个动词,但在大多数情况下仍然有效:
可以通过这种方式进行拆分
from textblob import TextBlob
def adv_split(s):
annotations = TextBlob(s).tags
# Extract adverbs (CC for coordinating conjunction or RB for adverbs)
adv_words = [ word for word,tag in annotations
if tag.startswith('CC') or tag.startswith('RB') ]
# We have at least one adverb
if len(adv_words) >0:
# Get the last one
adv_pos = s.index(adv_words[-1]) + len(adv_words[-1])
return [s[:adv_pos], s[adv_pos:]]
else:
return s
然后,您可以使用pandas
apply()
和新的explode()
方法( pandas> 0.25 )来拆分数据框:
import pandas as pd
data = pd.Series(["but well that's alright",
"otherwise however we'll have to",
"okay sure",
"what?"])
data.apply(adv_split).explode()
您得到:
0 but
0 well that's alright
1 otherwise however
1 we'll have to
2 okay sure
3 what?
这不是完全正确,因为well
的标签是错误的,但是您有主意。
答案 1 :(得分:0)
var iconFont = Typeface.CreateFromAsset(Context.Assets, "xxx.ttf");
Control.Typeface = iconFont;