连续连接多个列

时间:2017-04-11 04:33:39

标签: python pandas concatenation

我使用下面的代码创建了一个矩阵,并在其中存储了某些数据

df = []
r = 5000
c = 50
for i in xrange(r):
    r = [''] * c
    table.append(r)

这样矩阵看起来像这样:

    0     1          2                 3        4    5     6    7   ...
3   NaN   Nestlé     Africa            Import   
4   NaN   Nutella    Europe            Report   2010 to    2011 
5   Shell            USA               Revenues      2017     

由于每一行的列数不均匀,我对如何将所有列连接为一列并最终删除空的不必要的列感到困惑,因此它看起来像这样

    1
3.  Nestlé Africa Import
4.  Nutella Europe Report 2010 to 2011
5.  Shell USA Revenues 2017
etc.

如果在pandas.DataFrame (e.g. df2 = pd.DataFrame(df) )中执行此操作更容易,那么我也可以。

1 个答案:

答案 0 :(得分:0)

使用pandas,您可以加入非空列,例如:

<强>代码:

df['concat'] = df.apply(lambda x: ' '.join(
    [unicode(y) for y in x if not pd.isnull(y)]), axis=1)

测试代码:

import pandas as pd
from io import StringIO
df = pd.read_fwf(StringIO(u"""
    0     1          2                 3        4    5     6
3   NaN   Nestlé     Africa            Import   
4   NaN   Nutella    Europe            Report   2010 to    2011 
5   Shell            USA               Revenues      2017"""),
    skiprows=0, header=1, index_col=0)
print(df)

df['concat'] = df.apply(lambda x: ' '.join(
    [unicode(y) for y in x if y and not pd.isnull(y)]), axis=1)

print(df['concat'])

<强>结果:

       0        1       2         3     4     5     6
3          Nestlé  Africa    Import                  
4         Nutella  Europe    Report  2010    to  2011
5  Shell              USA  Revenues        2017      

3                      Nestlé Africa Import
4    Nutella Europe Report 2010.0 to 2011.0
5                   Shell USA Revenues 2017