我使用下面的代码创建了一个矩阵,并在其中存储了某些数据
df = []
r = 5000
c = 50
for i in xrange(r):
r = [''] * c
table.append(r)
这样矩阵看起来像这样:
0 1 2 3 4 5 6 7 ...
3 NaN Nestlé Africa Import
4 NaN Nutella Europe Report 2010 to 2011
5 Shell USA Revenues 2017
由于每一行的列数不均匀,我对如何将所有列连接为一列并最终删除空的不必要的列感到困惑,因此它看起来像这样
1
3. Nestlé Africa Import
4. Nutella Europe Report 2010 to 2011
5. Shell USA Revenues 2017
etc.
如果在pandas.DataFrame
(e.g. df2 = pd.DataFrame(df) )
中执行此操作更容易,那么我也可以。
答案 0 :(得分:0)
使用pandas
,您可以加入非空列,例如:
<强>代码:强>
df['concat'] = df.apply(lambda x: ' '.join(
[unicode(y) for y in x if not pd.isnull(y)]), axis=1)
测试代码:
import pandas as pd
from io import StringIO
df = pd.read_fwf(StringIO(u"""
0 1 2 3 4 5 6
3 NaN Nestlé Africa Import
4 NaN Nutella Europe Report 2010 to 2011
5 Shell USA Revenues 2017"""),
skiprows=0, header=1, index_col=0)
print(df)
df['concat'] = df.apply(lambda x: ' '.join(
[unicode(y) for y in x if y and not pd.isnull(y)]), axis=1)
print(df['concat'])
<强>结果:强>
0 1 2 3 4 5 6
3 Nestlé Africa Import
4 Nutella Europe Report 2010 to 2011
5 Shell USA Revenues 2017
3 Nestlé Africa Import
4 Nutella Europe Report 2010.0 to 2011.0
5 Shell USA Revenues 2017