Question

我有一个for循环：

for j  in range(0,len(df)):
    for i in range(0,32):
        df.iloc[j,i] = df['split'].iloc[j][i]

这需要更多时间来运行。如何使用一些iteritems或iterrows优化它或应用？

（我有动态的行数和列数）

样品：

   A  B  split
0  we    [w,e]
1  xy    [x,y]
2  ad    [a,d]
3  cf    [c,f]
4  de    [d,e]
5  tt    [t,t]

应该成为：

   A  B  split
0  w  e   [w,e]
1  x  y   [x,y]
2  a  d   [a,d]
3  c  c   [c,f]
4  d  e   [d,e]
5  t  t   [t,t]

Answer 1

这是使用NumPy表示的一种方式：

df = pd.DataFrame({'A': ['we', 'xy', 'ad', 'cf', 'de', 'tt'],
                   'B': ['', '', '', '', '', ''],
                   'split': [['w', 'e'], ['x', 'y'], ['a', 'd'],
                             ['c', 'f'], ['d', 'e'], ['t', 't']]})

df[['A', 'B']] = df['split'].values.tolist()

print(df)

   A  B   split
0  w  e  [w, e]
1  x  y  [x, y]
2  a  d  [a, d]
3  c  f  [c, f]
4  d  e  [d, e]
5  t  t  [t, t]

以下是使用str访问者和operator.itemgetter的另一种方式：

from operator import itemgetter

df['A'] = df['A'].str[0]
df['B'] = df['split'].apply(itemgetter(1))

Answer 2

试试这个，

res= df['A'].apply(lambda x:pd.Series(list(x)))
out=pd.concat([df,res],axis=1)

res将包含按字符拆分的新数据框。然后与您的旧数据框合并。根据您的意愿重命名列。它甚至可以用于动态范围的字符。

如果我有分隔符会发生什么？

只需要稍加修改，

res= df['A'].str.split(';',expand=True)

输入：

     A  split
0  wez  [w,e]
1   xy  [x,y]
2   ad  [a,d]
3   cf  [c,f]
4   de  [d,e]
5   tt  [t,t]

输出：

     A  split  0  1    2
0  wez  [w,e]  w  e    z
1   xy  [x,y]  x  y  NaN
2   ad  [a,d]  a  d  NaN
3   cf  [c,f]  c  f  NaN
4   de  [d,e]  d  e  NaN
5   tt  [t,t]  t  t  NaN

Answer 3

这是另一个动态解决方案：

来源DF：

In [272]: df
Out[272]:
        text
0  w;e;z;d;c
1      a;b;c
2          x

解决方案：

In [273]: import string

In [274]: res = df['text'].str.split(';', expand=True).fillna('')

In [275]: res
Out[275]:
   0  1  2  3  4
0  w  e  z  d  c
1  a  b  c
2  x

如果您不喜欢数字列名称，请

重命名列：

In [276]: res = res.rename(columns=lambda c: string.ascii_uppercase[c])

In [277]: res
Out[277]:
   A  B  C  D  E
0  w  e  z  d  c
1  a  b  c
2  x

PS @Mohamed Thasin啊已经提到过，事先创建空列没有多大意义。

如何使用iterrows＆amp; amp;更快地运行这个pandas for loop iteritems

3 个答案: