我的数据框看起来像
Abc XYZ
0 Hello How are you doing today
1 Good This is a
2 Bye See you
3 Books Read chapter 1 to 5 only
max_size = 3, 我想将列(XYZ)截断为最大3个单词(max_size)。某些行的长度小于max_size,应保持原样。
所需的输出:
Abc XYZ
0 Hello How are you
1 Good This is a
2 Bye See you
3 Books Read chapter 1
答案 0 :(得分:3)
使用带限制的拆分,删除最后一个值,然后将列表连接在一起:
max_size = 3
df['XYZ'] = df['XYZ'].str.split(n=max_size).str[:max_size].str.join(' ')
print (df)
Abc XYZ
0 Hello How are you
1 Good This is a
2 Bye See you
3 Books Read chapter 1
另一个具有lambda函数的解决方案:
df['XYZ'] = df['XYZ'].apply(lambda x: ' '.join(x.split(maxsplit=max_size)[:max_size]))