Question

我正在寻找合并重复列的方法，假设空白为NaN

Column1[1]  Column1[2]  Column1[3]  Column1[4]  Column1[4]  Column1[5]  Column1[6]  Column1[7]
  a 123                         
  b            432                      
  c                         53                  
  d                                 221             
  e                                                 2           
  f                                                             3       
  g                                                                         3243    
  h                                                                                     12

输出应该如下所示

  Row   Column1[ALL]
  a 123
  b 432
  c 53
  d 221
  e 2
  f 3
  g 3243
  h 12

Answer 1

如果df是您的数据帧：

$ tmux new-session -A -s [session-name]

Answer 2

以下是一些方便的方法，可以推广到所有数据类型

考虑数据框df

v = np.empty((8, 8), dtype=object)
v.fill(None)

i = np.arange(8)

v[i, i] = [123, 432, 53, 221, 2, 3, 'hello', 12]

df = pd.DataFrame(v, list('abcdefgh'), ['Column1[%s]' % i for i in range(1, 9)])

df

  Column1[1] Column1[2] Column1[3] Column1[4] Column1[5] Column1[6] Column1[7] Column1[8]
a        123       None       None       None       None       None       None       None
b       None        432       None       None       None       None       None       None
c       None       None         53       None       None       None       None       None
d       None       None       None        221       None       None       None       None
e       None       None       None       None          2       None       None       None
f       None       None       None       None       None          3       None       None
g       None       None       None       None       None       None      hello       None
h       None       None       None       None       None       None       None         12

选项1
默认情况下，stack会丢弃空值。如果每行只有一个值，则可以按需运行。

df.stack()

a  Column1[1]      123
b  Column1[2]      432
c  Column1[3]       53
d  Column1[4]      221
e  Column1[5]        2
f  Column1[6]        3
g  Column1[7]    hello
h  Column1[8]       12
dtype: object

或者

df.stack().reset_index(1, drop=True)

a      123
b      432
c       53
d      221
e        2
f        3
g    hello
h       12
dtype: object

选项2
apply和dropna

df.apply(lambda x: x.dropna()[0], 1)

a      123
b      432
c       53
d      221
e        2
f        3
g    hello
h       12
dtype: object

选项3
np.where和pd.DataFrame.lookup

的组合

i, j = np.where(df.notnull())
idx = df.index[i]
col = df.columns[j]

pd.Series(df.lookup(idx, col), idx)

a      123
b      432
c       53
d      221
e        2
f        3
g    hello
h       12
dtype: object

合并重复的列Pandas

2 个答案: