是否有一种干净的方式来连接类似于' '.join
成语的任意数量的字符串系列?如果我事先知道我想要的列,我可以做
import pandas as pd
df = pd.DataFrame([['word1','word2', 'word3']])
df[0] + ' ' + df[1] + ' ' + df[2]
0 word1 word2 word3
但是,我不知道将其推广到任意列列表的好方法。我提出的最好的是
cols = [0,1,2]
df[cols[0]].str.cat(df[cols[1:]].values.transpose(), sep = ' ')
0 word1 word2 word3
但我有点讨厌这个解决方案。也许有一种方法可以使用+
的重载来实现它?
答案 0 :(得分:3)
If you don't mind about space at the end of your rows you could use sum
which is a bit faster then manually typing df[0] + ' ' + df[1] + ' ' + df[2]
:
In [25]: (df + ' ').sum(axis=1)
Out[25]:
0 word1 word2 word3
dtype: object
Hovewer, if you need to strip last space then it becomes slower:
In [26]: (df + ' ').sum(axis=1).str.strip()
Out[26]:
0 word1 word2 word3
dtype: object
Timing:
In [34]: %timeit (df + ' ').sum(axis=1)
1000 loops, best of 3: 368 us per loop
In [38]: %timeit df[0] + ' ' + df[1] + ' ' + df[2]
1000 loops, best of 3: 482 us per loop
In [40]: %timeit (df + ' ').sum(axis=1).str.strip()
1000 loops, best of 3: 556 us per loop
In [47]: %timeit df[cols[0]].str.cat(df[cols[1:]].values.transpose(), sep = ' ')
1000 loops, best of 3: 870 us per loop
In [49]: %timeit df[[0,1,2]].apply(' '.join, axis=1)
1000 loops, best of 3: 937 us per loop
答案 1 :(得分:1)
选择列后,您可以apply
axis=1
(此处我会手动指定它们,但您可以使用cols
代替):
>>> df = pd.DataFrame([['word1','word2', 'word3']])
>>> df
0 1 2
0 word1 word2 word3
>>> df[[0,1,2]].apply(' '.join, axis=1)
0 word1 word2 word3
dtype: object