Question

我想在出现副词时将行拆分为新行。但是，如果连续出现多个副词，那么我只想在最后一个副词之后分成新的一行。

我的数据框示例如下：

                   
0         but well that's alright 
1 otherwise however we'll have to  
2                       okay sure 
3                           what?

使用副词= ['but'，'well'，'otherwise'，'however']，我希望生成的df看起来像这样：

    0             but well
    1         that's alright 
    2         otherwise however  
    3         we'll have to  
    2         okay sure 
    3         what?

Answer 1

我有一个局部解决方案，也许可以帮上忙。您可以使用TextBlob软件包。

使用此API，您可以为每个单词分配一个令牌。 here中提供了可能的令牌列表。

问题在于，标记单词并不完美，并且您对副词的定义可能与它们的定义不匹配（例如，but是API上的coordinating conjunction，而well标记出于某种原因是一个动词，但在大多数情况下仍然有效：

可以通过这种方式进行拆分

from textblob import TextBlob

def adv_split(s):
    annotations = TextBlob(s).tags
    # Extract adverbs (CC for coordinating conjunction or RB for adverbs)
    adv_words = [ word for word,tag in annotations 
                  if tag.startswith('CC') or tag.startswith('RB') ]
    # We have at least one adverb
    if len(adv_words) >0:
        # Get the last one
        adv_pos = s.index(adv_words[-1]) + len(adv_words[-1])
        return [s[:adv_pos], s[adv_pos:]]
    else:
        return s

然后，您可以使用pandas apply()和新的explode()方法（ pandas> 0.25 ）来拆分数据框：

import pandas as pd

data = pd.Series(["but well that's alright",
                  "otherwise however we'll have to",
                  "okay sure",
                  "what?"])
data.apply(adv_split).explode()

您得到：

0                     but
0     well that's alright
1       otherwise however
1           we'll have to
2               okay sure
3                   what?

这不是完全正确，因为well的标签是错误的，但是您有主意。

Answer 2

var iconFont = Typeface.CreateFromAsset(Context.Assets, "xxx.ttf");
Control.Typeface = iconFont;

如何根据关键字将字符串拆分为新的数据框行

2 个答案: