Question

我试图拆开两列：

cols = res.columns[:31]
res[cols] = res[cols].ffill()
res = res.set_index(cols + [31])[32].unstack().reset_index().rename_axis(None, 1)

但是我遇到一个错误：

TypeError: can only perform ops with scalar values

我应该怎么避免呢？

我原来的问题：LINK

Answer 1

我认为需要将列转换为列表：

cols = res.columns[:31].tolist()

编辑：

索引包含重复的条目，无法调整

意味着重复，这里是前6列，因此不可能创建新的DataFrame，因为前6列创建新索引，而7列创建新列，而8列是2个值：

    0  1  2  3  4   5  6   7
0  xx  s  1  d  f  df  f  54 
1  xx  s  1  d  f  df  f  g4

新数据框：

 index = xx  s  1  d  f  df
 column = f
 value = 54 

 index = xx  s  1  d  f  df
 column = f
 value = g4

所以解决方案是聚合的，这里使用字符串，因此需要.apply(', '.join)：

 index = xx  s  1  d  f  df
 column = f
 value = 54, g4

或删除重复项，并通过drop_duplicates保留重复行的第一个或最后一个值：

 index = xx  s  1  d  f  df
 column = f
 value = 54

 index = xx  s  1  d  f  df
 column = f
 value = g4

res = pd.DataFrame({0: ['xx',np.nan,np.nan,np.nan,'ds', np.nan, np.nan, np.nan, np.nan, 'as'],
                    1: ['s',np.nan,np.nan,np.nan,'a', np.nan, np.nan, np.nan, np.nan, 't'],
                    2: ['1',np.nan,np.nan,np.nan,'s', np.nan, np.nan, np.nan, np.nan, 'r'],
                    3: ['d',np.nan, np.nan, np.nan,'d', np.nan, np.nan, np.nan, np.nan, 'a'],
                    4: ['f',np.nan, np.nan, np.nan,'f', np.nan, np.nan, np.nan, np.nan, '2'],
                    5: ['df',np.nan,np.nan,np.nan,'ds',np.nan, np.nan, np.nan, np.nan, 'ds'],
                    6: ['f','f', 'x', 'r', 'f', 'd', 's', '1', '3', 'k'], 
                    7: ['54','g4', 'r4', '43', '64', '43', 'se', 'gf', 's3', 's4']})


cols = res.columns[:6].tolist()
res[cols] = res[cols].ffill()
print (res)
    0  1  2  3  4   5  6   7
0  xx  s  1  d  f  df  f  54 
1  xx  s  1  d  f  df  f  g4 
2  xx  s  1  d  f  df  x  r4
3  xx  s  1  d  f  df  r  43
4  ds  a  s  d  f  ds  f  64
5  ds  a  s  d  f  ds  d  43
6  ds  a  s  d  f  ds  s  se
7  ds  a  s  d  f  ds  1  gf
8  ds  a  s  d  f  ds  3  s3
9  as  t  r  a  2  ds  k  s4

res =res.groupby(cols + [6])[7].apply(', '.join).unstack().reset_index().rename_axis(None, 1)
print (res)

    0  1  2  3  4   5    1    3    d       f    k    r    s    x
0  as  t  r  a  2  ds  NaN  NaN  NaN     NaN   s4  NaN  NaN  NaN
1  ds  a  s  d  f  ds   gf   s3   43      64  NaN  NaN   se  NaN
2  xx  s  1  d  f  df  NaN  NaN  NaN  54, g4  NaN   43  NaN   r4 <-54, g4

另一种解决方案是删除重复项：

res = res.drop_duplicates(cols + [6])

res = res.set_index(cols + [6])[7].unstack().reset_index().rename_axis(None, 1)
print (res)
    0  1  2  3  4   5    1    3    d    f    k    r    s    x
0  as  t  r  a  2  ds  NaN  NaN  NaN  NaN   s4  NaN  NaN  NaN
1  ds  a  s  d  f  ds   gf   s3   43   64  NaN  NaN   se  NaN
2  xx  s  1  d  f  df  NaN  NaN  NaN   54  NaN   43  NaN   r4 <- 54

res = res.drop_duplicates(cols + [6], keep='last')

res = res.set_index(cols + [6])[7].unstack().reset_index().rename_axis(None, 1)
print (res)
    0  1  2  3  4   5    1    3    d    f    k    r    s    x
0  as  t  r  a  2  ds  NaN  NaN  NaN  NaN   s4  NaN  NaN  NaN
1  ds  a  s  d  f  ds   gf   s3   43   64  NaN  NaN   se  NaN
2  xx  s  1  d  f  df  NaN  NaN  NaN   g4  NaN   43  NaN   r4 <- g4

为什么大熊猫拆栈会引发错误？

1 个答案: