我偶然发现了类似的问题(Get last "column" after .str.split() operation on column in pandas DataFrame),并使用了一些代码。但是,这不是我想要的输出。
raw_data = {
'category': ['sweet beverage, cola,sugared', 'healthy,salty snacks', 'juice,beverage,sweet', 'fruit juice,beverage', 'appetizer,salty crackers'],
'product_name': ['coca-cola', 'salted pistachios', 'fruit juice', 'lemon tea', 'roasted peanuts']}
df = pd.DataFrame(raw_data)
目标是从每一行中提取各种类别,并仅使用最后两个类别来创建新列。我有这个代码,它有效,我将感兴趣的类别作为一个新列。
df['my_col'] = df.categories.apply(lambda s:s.split(',')[-2:])
output
my_col
[cola,sugared]
[healthy,salty snacks]
[beverage,sweet]
...
但是,它显示为列表。我怎么能不将它显示为列表?这可以实现吗?谢谢大家!!!!!
答案 0 :(得分:2)
我认为您需要str.split
,选择最后列表并上传str.join
:
augmented/
编辑:
在我看来,pandas df['my_col'] = df.category.str.split(',').str[-2:].str.join(',')
print (df)
category product_name my_col
0 sweet beverage, cola,sugared coca-cola cola,sugared
1 healthy,salty snacks salted pistachios healthy,salty snacks
2 juice,beverage,sweet fruit juice beverage,sweet
3 fruit juice,beverage lemon tea fruit juice,beverage
4 appetizer,salty crackers roasted peanuts appetizer,salty crackers
text functions更推荐为带有puru python字符串函数的str
,因为还可以使用apply
和NaN
。
None
AttributeError:' float'对象没有属性' split'
答案 1 :(得分:1)
您还可以在join
中使用lambda
split
的结果:
df['my_col'] = df.category.apply(lambda s: ','.join(s.split(',')[-2:]))
df
结果:
category product_name my_col
0 sweet beverage, cola,sugared coca-cola cola,sugared
1 healthy,salty snacks salted pistachios healthy,salty snacks
2 juice,beverage,sweet fruit juice beverage,sweet
3 fruit juice,beverage lemon tea fruit juice,beverage
4 appetizer,salty crackers roasted peanuts appetizer,salty crackers