input:
buzz_id facet facet_cls facet_val p_buzz_date
0 95713207 A3 Small MN 20160101
1 95713207 S3 Small-box Tbd 20160101
2 95713207 F1 Medium es 20160101
3 95713207 A2 Medium-box esf 20160101
4 95713207 A1 Dum-pal ess 20160101
...
output:
buzz_id facet facet_cls facet_val p_buzz_date
0 95713207 A3 Small MN 20160101
1 95713207 S3 Small Tbd 20160101
2 95713207 F1 Medium es 20160101
3 95713207 A2 Medium esf 20160101
4 95713207 A1 Dum ess 20160101
...
所以在我的'facet_cls'列中,需要删除' - '(包括' - ')之后的任何内容。我的数据本身也很大,所以我希望使用我能找到的最快的流程。有什么想法吗?
提前致谢!
答案 0 :(得分:2)
使用split
,然后按str[0]
选择列表的第一个值:
df['facet_cls'] = df['facet_cls'].str.split('-').str[0]
print (df)
buzz_id facet facet_cls facet_val p_buzz_date
0 95713207 A3 Small MN 20160101
1 95713207 S3 Small Tbd 20160101
2 95713207 F1 Medium es 20160101
3 95713207 A2 Medium esf 20160101
4 95713207 A1 Dum ess 20160101
详情:
print (df['facet_cls'].str.split('-'))
0 [Small]
1 [Small, box]
2 [Medium]
3 [Medium, box]
4 [Dum, pal]
Name: facet_cls, dtype: object
答案 1 :(得分:1)
您也可以使用lambda表达式执行以下操作:
df['facet_cls'] = df['facet_cls'].apply(lambda x:x.split('-')[0])