为什么大熊猫拆栈会引发错误?

时间:2018-07-02 05:39:39

标签: python pandas numpy dataframe stack

我试图拆开两列:

cols = res.columns[:31]
res[cols] = res[cols].ffill()
res = res.set_index(cols + [31])[32].unstack().reset_index().rename_axis(None, 1)

但是我遇到一个错误:

TypeError: can only perform ops with scalar values

我应该怎么避免呢?

我原来的问题:LINK

1 个答案:

答案 0 :(得分:1)

我认为需要将列转换为列表:

cols = res.columns[:31].tolist()

编辑:

  

索引包含重复的条目,无法调整

意味着重复,这里是前6列,因此不可能创建新的DataFrame,因为前6列创建新索引,而7列创建新列,而8列是2个值:

    0  1  2  3  4   5  6   7
0  xx  s  1  d  f  df  f  54 
1  xx  s  1  d  f  df  f  g4 

新数据框:

 index = xx  s  1  d  f  df
 column = f
 value = 54 

 index = xx  s  1  d  f  df
 column = f
 value = g4 

所以解决方案是聚合的,这里使用字符串,因此需要.apply(', '.join)

 index = xx  s  1  d  f  df
 column = f
 value = 54, g4 

或删除重复项,并通过drop_duplicates保留重复行的第一个或最后一个值:

 index = xx  s  1  d  f  df
 column = f
 value = 54
 index = xx  s  1  d  f  df
 column = f
 value = g4

res = pd.DataFrame({0: ['xx',np.nan,np.nan,np.nan,'ds', np.nan, np.nan, np.nan, np.nan, 'as'],
                    1: ['s',np.nan,np.nan,np.nan,'a', np.nan, np.nan, np.nan, np.nan, 't'],
                    2: ['1',np.nan,np.nan,np.nan,'s', np.nan, np.nan, np.nan, np.nan, 'r'],
                    3: ['d',np.nan, np.nan, np.nan,'d', np.nan, np.nan, np.nan, np.nan, 'a'],
                    4: ['f',np.nan, np.nan, np.nan,'f', np.nan, np.nan, np.nan, np.nan, '2'],
                    5: ['df',np.nan,np.nan,np.nan,'ds',np.nan, np.nan, np.nan, np.nan, 'ds'],
                    6: ['f','f', 'x', 'r', 'f', 'd', 's', '1', '3', 'k'], 
                    7: ['54','g4', 'r4', '43', '64', '43', 'se', 'gf', 's3', 's4']})


cols = res.columns[:6].tolist()
res[cols] = res[cols].ffill()
print (res)
    0  1  2  3  4   5  6   7
0  xx  s  1  d  f  df  f  54 
1  xx  s  1  d  f  df  f  g4 
2  xx  s  1  d  f  df  x  r4
3  xx  s  1  d  f  df  r  43
4  ds  a  s  d  f  ds  f  64
5  ds  a  s  d  f  ds  d  43
6  ds  a  s  d  f  ds  s  se
7  ds  a  s  d  f  ds  1  gf
8  ds  a  s  d  f  ds  3  s3
9  as  t  r  a  2  ds  k  s4

res =res.groupby(cols + [6])[7].apply(', '.join).unstack().reset_index().rename_axis(None, 1)
print (res)

    0  1  2  3  4   5    1    3    d       f    k    r    s    x
0  as  t  r  a  2  ds  NaN  NaN  NaN     NaN   s4  NaN  NaN  NaN
1  ds  a  s  d  f  ds   gf   s3   43      64  NaN  NaN   se  NaN
2  xx  s  1  d  f  df  NaN  NaN  NaN  54, g4  NaN   43  NaN   r4 <-54, g4

另一种解决方案是删除重复项:

res = res.drop_duplicates(cols + [6])

res = res.set_index(cols + [6])[7].unstack().reset_index().rename_axis(None, 1)
print (res)
    0  1  2  3  4   5    1    3    d    f    k    r    s    x
0  as  t  r  a  2  ds  NaN  NaN  NaN  NaN   s4  NaN  NaN  NaN
1  ds  a  s  d  f  ds   gf   s3   43   64  NaN  NaN   se  NaN
2  xx  s  1  d  f  df  NaN  NaN  NaN   54  NaN   43  NaN   r4 <- 54
res = res.drop_duplicates(cols + [6], keep='last')

res = res.set_index(cols + [6])[7].unstack().reset_index().rename_axis(None, 1)
print (res)
    0  1  2  3  4   5    1    3    d    f    k    r    s    x
0  as  t  r  a  2  ds  NaN  NaN  NaN  NaN   s4  NaN  NaN  NaN
1  ds  a  s  d  f  ds   gf   s3   43   64  NaN  NaN   se  NaN
2  xx  s  1  d  f  df  NaN  NaN  NaN   g4  NaN   43  NaN   r4 <- g4