Question

我在v之前的第一个单词之后和之后的第五个单词。

df = pd.DataFrame({'text': ["cans choc v macroni ice", 
                            "chocolate sundaes v chocolate ice cream", 
                            "Chocolate v sauce"]})

我的数据框如下：

cans choc v macroni ice
chocolate sundaes v chocolate ice cream
Chocolate v sauce

我希望它看起来像：

cans v macroni
chocolate v chocolate
Chocolate v sauce

如何在大熊猫中实现这一目标？共同的元素是＆＃39; v＆＃39;。

Answer 1

是否有理由不能使用split函数，然后将函数映射到列？

根据第一个例子，这将起作用：

def word_scrape(whole_string):
    outside_v = whole_string.split(" v ")
    first_word = outside_v[0].split(" ")[0]
    last_word = outside_v[1].split(" ")[1]
    return first_word + " v " + last_word

for i,text in enumerate(df.ix[:,'text']):
    df.ix[i,'text'] = word_scrape(text)

对于单个单词条目的容错，请使用：

def word_scrape(whole_string):
    try:
        outside_v = whole_string.split(" v ")
        first_word = outside_v[0].split(" ")[0]
        last_word = outside_v[1].split(" ")[1]
        return first_word + " v " + last_word
    except: 
        outside_v = whole_string.split(" v ")
        first_word = outside_v[0].split(" ")[0]
        last_word = outside_v[1].split(" ")[0]
        return first_word + " v " + last_word

for i,text in enumerate(df.ix[:,'text']):
    df.ix[i,'text'] = word_scrape(text)

根据第二个例子，这将起作用：

def word_scrape(whole_string):
    outside_v = whole_string.split(" v ")
    first_word = outside_v[0].split(" ")[0]
    last_word = outside_v[1].split(" ")[0]
    return first_word + " v " + last_word

for i,text in enumerate(df.ix[:,'text']):
    df.ix[i,'text'] = word_scrape(text)

Answer 2

你可以使用正则表达式，正如@James建议的那样。但另一种方法是使用pandas apply，这通常会处理手头的问题。

（顺便说一下，有几个非常相似的问题和答案，例如this one。）

>>> def my_fun(my_text, my_sep):
>>>   vals = my_text.split(my_sep)
>>>   vals = [val.split()[0] for val in vals]
>>>   return vals

>>> df.text.apply(lambda my_text: my_fun(my_text, 'v'))

当然，请使用比这更好的名字！： - ）

Answer 3

您可以将正则表达式传递给text列上的字符串操作。

df.text.str.extract(r'(\w+ v \w+)', expand=True)

# returns:
                     0
0       choc v macroni
1  sundaes v chocolate
2    Chocolate v sauce

Answer 4

我们试试这个：

df.text.str.split('v', expand=True)\
  .apply(lambda x: x.str.extract('(\w+)', expand=False))\
  .apply(lambda x: ' v '.join(x), 1)

输出：

0           cans v macroni
1    chocolate v chocolate
2        Chocolate v sauce

如何在＆text;＆＃39;之前和之后获得第一个字。

4 个答案: