Question

我有一个数据框，其中某些行包含一个特殊字符'＃'。

这是我的数据，我可以找到'＃'的索引位置：

import pandas as pd
df = pd.DataFrame(data=['fig#abc', 'strawberry', 'applepie#efg'], columns=['fruitname'])
ind= df.fruitname.str.find("#")
df['col1'].str.find(".")-1]
print df
print ind


    fruitname
0   fig#abc
1   strawberry
2   applepie#efg

0    3
1   -1
2    8

如果索引'＃'大于4，我想要一个新的列数据，其值为'＃'前的前几个字符，否则原始数据的值为：

   fruitname_new
0  fig#abc
1  strawberry
2  applepie

获得此结果的最佳方法是什么？

Answer 1

#use apply to split fruitname and then check the length before setting the new fruitname column.

df['fruitname_new'] = df.apply(lambda x: x.fruitname if len(x.fruitname.split('#')[0])<=4 else x.fruitname.split('#')[0], axis=1)

df
Out[484]: 
      fruitname fruitname_new
0       fig#abc       fig#abc
1    strawberry    strawberry
2  applepie#efg      applepie

如果行中存在某些字符，如何获取熊猫数据框中的子字符串？

1 个答案: