Question

我在数据框上应用了两个功能

res = df.apply(lambda x:pd.Series(list(x)))  
res = res.applymap(lambda x: x.strip('"') if isinstance(x, str) else x)

{{Update}}数据框有近70万行。这要花很多时间。

如何减少运行时间？

样本数据：

   A        
 ----------
0 [1,4,3,c] 
1 [t,g,h,j]  
2 [d,g,e,w]  
3 [f,i,j,h] 
4 [m,z,s,e] 
5 [q,f,d,s]

输出：

   A         B   C   D  E
-------------------------
0 [1,4,3,c]  1   4   3  c
1 [t,g,h,j]  t   g   h  j
2 [d,g,e,w]  d   g   e  w
3 [f,i,j,h]  f   i   j  h
4 [m,z,s,e]  m   z   s  e
5 [q,f,d,s]  q   f   d  s

这行代码res = df.apply(lambda x:pd.Series(list(x)))从列表中获取项目，并逐一填充到每一列，如上所示。几乎有38列。

Answer 1

我认为：

res = df.apply(lambda x:pd.Series(list(x)))

应更改为：

df1 = pd.DataFrame(df['A'].values.tolist())
print (df1)
   0  1  2  3
0  1  4  3  c
1  t  g  h  j
2  d  g  e  w
3  f  i  j  h
4  m  z  s  e
5  q  f  d  s

第二个（如果未混合的话）列值-带字符串的数字：

cols = res.select_dtypes(object).columns
res[cols] = res[cols].apply(lambda x: x.str.strip('"'))

熊猫的apply和applymap函数需要花费很长时间才能在大型数据集上运行

1 个答案: