在str.split操作之后创建具有最后2个值的新列

时间:2018-02-17 20:58:04

标签: python pandas

我偶然发现了类似的问题(Get last "column" after .str.split() operation on column in pandas DataFrame),并使用了一些代码。但是,这不是我想要的输出。

raw_data = {
    'category': ['sweet beverage, cola,sugared', 'healthy,salty snacks', 'juice,beverage,sweet', 'fruit juice,beverage', 'appetizer,salty crackers'],
    'product_name': ['coca-cola', 'salted pistachios', 'fruit juice', 'lemon tea', 'roasted peanuts']}                                                      
df = pd.DataFrame(raw_data)

目标是从每一行中提取各种类别,并仅使用最后两个类别来创建新列。我有这个代码,它有效,我将感兴趣的类别作为一个新列。

df['my_col'] = df.categories.apply(lambda s:s.split(',')[-2:])

output
my_col 
[cola,sugared]
[healthy,salty snacks]
[beverage,sweet]
...

但是,它显示为列表。我怎么能不将它显示为列表?这可以实现吗?谢谢大家!!!!!

2 个答案:

答案 0 :(得分:2)

我认为您需要str.split,选择最后列表并上传str.join

augmented/

编辑:

在我看来,pandas df['my_col'] = df.category.str.split(',').str[-2:].str.join(',') print (df) category product_name my_col 0 sweet beverage, cola,sugared coca-cola cola,sugared 1 healthy,salty snacks salted pistachios healthy,salty snacks 2 juice,beverage,sweet fruit juice beverage,sweet 3 fruit juice,beverage lemon tea fruit juice,beverage 4 appetizer,salty crackers roasted peanuts appetizer,salty crackers text functions更推荐为带有puru python字符串函数的str,因为还可以使用applyNaN

None
  

AttributeError:' float'对象没有属性' split'

答案 1 :(得分:1)

您还可以在join中使用lambda split的结果:

df['my_col'] = df.category.apply(lambda s: ','.join(s.split(',')[-2:]))
df

结果:

                       category       product_name                    my_col
0  sweet beverage, cola,sugared          coca-cola              cola,sugared
1          healthy,salty snacks  salted pistachios      healthy,salty snacks
2          juice,beverage,sweet        fruit juice            beverage,sweet
3          fruit juice,beverage          lemon tea      fruit juice,beverage
4      appetizer,salty crackers    roasted peanuts  appetizer,salty crackers