如何用numpy创建这个表格数据结构?

时间:2016-06-09 09:58:23

标签: python numpy

我有几个双列表,我想与numpy一起加入。每个表都有x和y列。我需要连接在一起的所有x列,y值与相应的x匹配。如果x值没有相应的y,则它应为None。

我不太擅长解释,所以一个例子可能会更好:

x1=np.arange(10)
y1=np.random.random(10)
x2=np.arange(4,12)
y2=np.random.random(8)
x1,y1,x2,y2
# (array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),
#  array([ 0.9697099 ,  0.73551173,  0.47020836,  0.65181839,  0.978934  ,
    0.18953898,  0.46405499,  0.50087478,  0.06777209,  0.45780724]),
#  array([ 4,  5,  6,  7,  8,  9, 10, 11]),
#  array([ 0.4871265 ,  0.13677392,  0.17808162,  0.92777264,  0.43666515,
    0.96582633,  0.8801327 ,  0.96819467]))

我希望它能产生这个结果:

(array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11]),
 array([0.9697099017727184, 0.7355117301087176, 0.47020836280801315,
   0.6518183854825162, 0.9789339965301322, 0.18953898439009775,
   0.46405499422617846, 0.5008747838856135, 0.06777208803132984,
   0.45780724068543743, None, None], dtype=object),
 array([None, None, None, None, 0.4871264999476407, 0.13677391508082204,
   0.17808162175961462, 0.927772639273923, 0.43666515340304246,
   0.9658263324455688, 0.880132700341068, 0.9681946747550136], dtype=object))

我试过搜索但找不到任何东西。也许我没有正确地制定我的搜索。

1 个答案:

答案 0 :(得分:2)

您可以使用pandas来轻松完成此操作,方法是将数组作为dict中的值传递,每个列的名称分别定义为xy1y2 DF:

In [280]:
import pandas as pd
import numpy as np
x1=np.arange(10)
y1=np.random.random(10)
x2=np.arange(4,12)
y2=np.random.random(8)
df1 = pd.DataFrame({'x':x1,'y1':y1})
df2 = pd.DataFrame({'x':x2,'y2':y2})
df1

Out[280]:
   x        y1
0  0  0.951029
1  1  0.974854
2  2  0.391443
3  3  0.487474
4  4  0.430653
5  5  0.737643
6  6  0.547114
7  7  0.770040
8  8  0.475704
9  9  0.577185

In [281]:
df2

Out[281]:
    x        y2
0   4  0.894808
1   5  0.534086
2   6  0.257441
3   7  0.658060
4   8  0.443201
5   9  0.319719
6  10  0.360698
7  11  0.542051

然后我们可以merge执行outer类型合并,这将匹配公共x列,并自动插入NaN,其中没有相应的值:

In [279]:    
df1.merge(df2, how='outer')

Out[279]:
       x        y1        y2
0    0.0  0.714475       NaN
1    1.0  0.628956       NaN
2    2.0  0.262343       NaN
3    3.0  0.022310       NaN
4    4.0  0.271616  0.343311
5    5.0  0.075175  0.503210
6    6.0  0.424153  0.874114
7    7.0  0.677780  0.677042
8    8.0  0.986892  0.672466
9    9.0  0.383558  0.896930
10  10.0       NaN  0.871810
11  11.0       NaN  0.510811

您可以通过调用values属性转换为np数组:

In [282]:
df1.merge(df2, how='outer').values

Out[282]:
array([[  0.        ,   0.95102908,          nan],
       [  1.        ,   0.97485407,          nan],
       [  2.        ,   0.39144301,          nan],
       [  3.        ,   0.48747382,          nan],
       [  4.        ,   0.43065283,   0.89480821],
       [  5.        ,   0.73764321,   0.53408613],
       [  6.        ,   0.54711396,   0.25744133],
       [  7.        ,   0.77003988,   0.65806007],
       [  8.        ,   0.47570448,   0.44320138],
       [  9.        ,   0.57718451,   0.31971908],
       [ 10.        ,          nan,   0.36069758],
       [ 11.        ,          nan,   0.54205073]])