将不同列表的特定元素合并到新列表

时间:2019-09-17 20:40:39

标签: python pandas numpy merge

我写了如下代码。我首先生成了大小为3,5的统一随机变量。然后,我将该2d数组中的每个元素用作均值并生成新列表。我想做的是创建10个新的2d数组,同时使用列表中每个元素上相同形状的3,5。例如

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

mean_route1 = pd.DataFrame(np.random.uniform(0, 10, size=(3,5)))
print(mean_route1)
N=10

for m in np.nditer(mean_route1):
    m3 = np.random.poisson(lam = m, size=N)
    print(m3)

输出如下:

          0         1         2         3         4
0  7.740569  5.435856  6.682996  5.213202  2.100649
1  6.174332  0.059057  2.951913  1.341994  2.734486
2  7.780503  7.277458  7.406986  8.498494  0.070157
[ 5  5  7  7  9  5  9 12  7  5]
[ 4  4  3  4 12  3  9  6  6  1]
[8 8 1 9 3 5 8 7 4 6]
[5 6 9 6 4 4 9 7 4 5]
[2 3 3 3 0 2 4 1 4 1]
[4 6 9 3 8 4 3 7 8 5]
[0 0 0 0 0 0 0 0 0 0]
[2 1 3 4 2 2 0 1 3 3]
[2 1 2 2 1 0 1 0 1 1]
[2 1 3 5 5 3 5 4 1 3]
[ 5  5  7  6  6  6 10 10  5  7]
[ 7  6  7  9  4 14  6  7  8  9]
[ 8 10  1  9 10  7  9  9  9 13]
[14  4  8 10  6  3 10  7 12  4]
[0 0 0 0 1 0 0 0 0 0]

例如:接下来,我想做的是10个这样的数组:((:,0)列在新的第一个数组上。

          0         1         2         3         4
0         5         4         8         5         2
1         4         0         2         2         2
2         5         7         8        14         0

(:, 1)在新的第二个数组上,...,(:,10)在新的第10个数组上。

我该怎么做?我是Python和stackoverflow的新手,所以如果出现错误,我表示歉意。

3 个答案:

答案 0 :(得分:2)

(暂时)忘记数据帧,使用numpy,我们可以做到:

In [87]: mean_route1 = np.random.uniform(0,10,size=15)                                 
In [88]: alist = []                                                                    
In [89]: for m in mean_route1: 
    ...:     alist.append(np.random.poisson(lam=m, size=10)) 
    ...:                                                                               
In [90]: arr = np.array(alist)                                                         
In [91]: arr                                                                           
Out[91]: 
array([[ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [ 4,  2,  3,  2,  6,  7,  3,  7,  7,  5],
       [ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [ 7,  9,  8,  1,  6,  5,  6, 11,  6,  1],
       [16,  7,  9,  6,  6, 11, 11, 16,  9, 12],
       [ 3,  5,  2,  0,  2,  6,  4,  5,  3,  3],
       [ 5,  5,  8,  7,  9, 10,  5, 10,  7,  8],
       [ 5,  5,  4,  4,  2,  5,  1,  2,  1,  2],
       [ 4,  2,  6,  7,  2,  6,  5,  0,  1,  4],
       [ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [ 5,  5,  5,  4,  3,  2,  5,  7,  4,  5],
       [ 1,  1,  1,  1,  2,  0,  2,  0,  1,  3],
       [ 0,  0,  0,  0,  0,  0,  1,  0,  0,  0],
       [ 0,  0,  6,  1,  3,  2,  0,  1,  1,  2],
       [ 9, 10, 10,  8,  9,  9,  9,  6, 12,  9]])

这是一个(15,10)形状数组,每个15个lam值都有10个样本。如果您愿意,我们可以将其重塑为(3,5,10),尽管这不会更改值。

In [92]: arr.reshape(3,5,10)                                                           
Out[92]: 
array([[[ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0],
        [ 4,  2,  3,  2,  6,  7,  3,  7,  7,  5],
        [ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0],
        [ 7,  9,  8,  1,  6,  5,  6, 11,  6,  1],
        [16,  7,  9,  6,  6, 11, 11, 16,  9, 12]],

       [[ 3,  5,  2,  0,  2,  6,  4,  5,  3,  3],
        [ 5,  5,  8,  7,  9, 10,  5, 10,  7,  8],
        [ 5,  5,  4,  4,  2,  5,  1,  2,  1,  2],
        [ 4,  2,  6,  7,  2,  6,  5,  0,  1,  4],
        [ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0]],

       [[ 5,  5,  5,  4,  3,  2,  5,  7,  4,  5],
        [ 1,  1,  1,  1,  2,  0,  2,  0,  1,  3],
        [ 0,  0,  0,  0,  0,  0,  1,  0,  0,  0],
        [ 0,  0,  6,  1,  3,  2,  0,  1,  1,  2],
        [ 9, 10, 10,  8,  9,  9,  9,  6, 12,  9]]])

从(15,)而不是(3,5)开始,我可以进行简单的迭代,而不会带来nditer的麻烦。 (除非您确实需要一些特殊功能,否则我不鼓励使用nditer。它并不快。)

我可以像这样的循环从(3,5,10)数组构造10个数据帧:

In [94]: import pandas as pd                                                           
In [95]: for i in range(3): 
    ...:     print(pd.DataFrame(_92[:,:,i]))   # Out[92] array
    ...:                                                                               
   0  1  2  3   4           # 1st column
0  0  4  0  7  16
1  3  5  5  4   0
2  5  1  0  0   9

   0  1  2  3   4           # 2nd column
0  0  2  0  9   7
1  5  5  5  2   0
2  5  1  0  0  10

   0  1  2  3   4
0  0  3  0  8   9
1  2  8  4  6   0
2  5  1  0  6  10

我可以一次调用具有所有poisson值的mean_route1

In [97]: np.random.poisson(lam=mean_route1, size=(10,15))                              
Out[97]: 
array([[ 0,  2,  0,  4, 11,  5,  9,  2,  8,  0, 10,  0,  0,  1,  5],
       [ 0,  4,  0,  3,  9,  3, 11,  3,  4,  0,  4,  0,  2,  0,  7],
       [ 0,  4,  0,  4,  6,  1,  7,  4,  2,  0,  5,  1,  0,  0,  5],
       ...
       [ 0,  9,  0,  6, 12,  3,  3,  5,  3,  0,  6,  1,  1,  1,  6]])

或换位到我在Out[91]中得到的(15,10):

In [98]: np.random.poisson(lam=mean_route1, size=(10,15)).T                            
Out[98]: 
array([[ 0,  0,  1,  0,  0,  0,  0,  0,  0,  0],
       [ 1,  4,  5,  6,  7,  1,  6,  2,  0,  2],
       [ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [ 4,  5,  4,  6,  3,  9,  1, 10,  3,  4],
       ....
       [10,  8,  5, 13,  7, 10,  5, 10,  7,  9]])

或具有lam的(3,5)数组:

In [100]: np.random.poisson(lam=mean_route1.reshape(3,5), size=(10,3,5))               
Out[100]: 
array([[[ 0,  1,  0,  2,  9],
        [ 1,  7,  2,  6,  0],
        [ 3,  0,  0,  1, 10]],

       [[ 0,  5,  0,  7,  8],
        [ 2,  6,  2,  8,  0],
        [ 5,  2,  0,  1, 11]],

       [[ 0,  7,  0,  7, 11],
        [ 2,  7,  2,  4,  0],
        [ 7,  1,  1,  1, 10]],
      ....
        [ 7,  1,  1,  3, 12]]])

同样,制作数据帧,这次在第一个维度上进行迭代:

In [101]: for i in range(3): 
     ...:     print(pd.DataFrame(_100[i,:,:])) 
     ...:                                                                              
   0  1  2  3   4
0  0  1  0  2   9
1  1  7  2  6   0
2  3  0  0  1  10

   0  1  2  3   4
0  0  5  0  7   8
1  2  6  2  8   0
2  5  2  0  1  11

   0  1  2  3   4
0  0  7  0  7  11
1  2  7  2  4   0
2  7  1  1  1  10

答案 1 :(得分:1)

看看是否可以在这里为我提供帮助,我已经成功创建了d(这是每个数据框的输入),现在应该使用d中的每个子列表创建数据框。我也会尽力而为,但是就目前而言,这还远远不够:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
mean_route1 = pd.DataFrame(np.random.uniform(0, 10, size=(3,5)))
print(mean_route1)
N=10
a = []
c = []
for m in np.nditer(mean_route1):
    m3 = list(np.random.poisson(lam = m, size=N))
    print(m3)
    a.append(m3)

这是每个列表的输出:

[4, 6, 8, 12, 4, 10, 8, 7, 9, 13]
[12, 11, 12, 8, 9, 4, 7, 10, 11, 6]
[2, 1, 2, 0, 4, 3, 2, 3, 0, 3]
[4, 4, 7, 2, 9, 3, 9, 5, 10, 11]
[6, 9, 11, 6, 10, 14, 14, 6, 10, 7]
[5, 7, 4, 8, 4, 7, 9, 3, 6, 2]
[3, 3, 4, 7, 5, 7, 5, 4, 2, 3]
[6, 3, 6, 4, 7, 3, 4, 1, 4, 2]
[1, 1, 1, 1, 0, 2, 4, 2, 0, 1]
[6, 5, 7, 6, 5, 8, 10, 6, 8, 4]
[3, 2, 3, 4, 5, 3, 2, 1, 1, 5]
[5, 5, 5, 2, 6, 11, 8, 13, 6, 11]
[4, 6, 4, 4, 4, 4, 7, 6, 8, 6]
[7, 5, 11, 3, 8, 7, 5, 10, 3, 7]
[12, 5, 7, 10, 8, 4, 5, 6, 8, 4]

现在,我用所有值创建一个大列表,但按照您请求的顺序,有点像“转置”列表。

for b in range(10):
    for i in range(len(a)):
        c.append(a[i][b])
print(c)

输出:

[4, 12, 2, 4, 6, 5, 3, 6, 1, 6, 3, 5, 4, 7, 12, 6, 11, 1, 4, 9, 7, 3, 3, 1, 5, 2, 5, 6, 5, 5, 8, 12, 2, 7, 11, 4, 4, 6, 1, 7, 3, 5, 4, 11, 7, 12, 8, 0, 2, 6, 8, 7, 4, 1, 6, 4, 2, 4, 3, 10, 4, 9, 4, 9, 10, 4, 5, 7, 0, 5, 5, 6, 4, 8, 8, 10, 4, 3, 3, 14, 7, 7, 3, 2, 8, 3, 11, 4, 7, 4, 8, 7, 2, 9, 14, 9, 5, 4, 4, 10, 2, 8, 7, 5, 5, 7, 10, 3, 5, 6, 3, 4, 1, 2, 6, 1, 13, 6, 10, 6, 9, 11, 0, 10, 10, 6, 2, 4, 0, 8, 1, 6, 8, 3, 8, 13, 6, 3, 11, 7, 2, 3, 2, 1, 4, 5, 11, 6, 7, 4]

以15s为一组将这个大列表用于新数据帧:

d = []
for i in range(10):
    d.append(c[(i)*15:((i+1)*15)])
print(d)

输出:

[[4, 12, 2, 4, 6, 5, 3, 6, 1, 6, 3, 5, 4, 7, 12], [6, 11, 1, 4, 9, 7, 3, 3, 1, 5, 2, 5, 6, 5, 5], [8, 12, 2, 7, 11, 4, 4, 6, 1, 7, 3, 5, 4, 11, 7], [12, 8, 0, 2, 6, 8, 7, 4, 1, 6, 4, 2, 4, 3, 10], [4, 9, 4, 9, 10, 4, 5, 7, 0, 5, 5, 6, 4, 8, 8], [10, 4, 3, 3, 14, 7, 7, 3, 2, 8, 3, 11, 4, 7, 4], [8, 7, 2, 9, 14, 9, 5, 4, 4, 10, 2, 8, 7, 5, 5], [7, 10, 3, 5, 6, 3, 4, 1, 2, 6, 1, 13, 6, 10, 6], [9, 11, 0, 10, 10, 6, 2, 4, 0, 8, 1, 6, 8, 3, 8], [13, 6, 3, 11, 7, 2, 3, 2, 1, 4, 5, 11, 6, 7, 4]]

最后要创建每个数据框,这就是我要做的:

df1 = pd.DataFrame({'row1':d[0][:5],'row2':d[0][5:10],'row3':d[0][10:15]}).T
print(df1)
        0   1   2   3   4
row 1   4   12  2   4   6
row 2   5   3   6   1   6
row 3   3   5   4   7   12

可能对组成d的15个子列表的列表d中的每个索引值重复此过程。这感觉远非理想,但这是我设法解决问题的方式。

答案 2 :(得分:1)

这是使用numpy功能的解决方案。

mean_route1 = pd.DataFrame(np.random.uniform(0, 10, size=(3,5)))
print(mean_route1)
N=10

a = [np.random.poisson(lam = m, size=N) for m in np.nditer(mean_route1)]
b = np.stack(a)
c = [pd.DataFrame(np.reshape(arr, (3, 5))) for arr in b.T]

如果您打印abc,则会看到:

  • a由您称为m3的行组成,不同之处在于:是ndarray的列表。列表中有15个元素,每个元素都是一个ndarray生成的长度为10的np.random.poisson
  • ba的堆栈。一个二维数组,其行是a中的数组。
  • c是您的预期结果,是数据目录。通过转置bb.T是转置矩阵)并在转置的b(原始b的一列)的每一行上进行迭代来创建。每行被重塑为(3,5)矩阵,并转换为熊猫数据帧,并附加到c

例如,如果a是:

[array([5, 4, 6, 6, 3, 0, 2, 7, 5, 3]),
 array([ 3,  2,  5,  9,  6,  6,  8, 14,  3,  4]),
 array([ 1,  4,  2,  2, 10,  3,  4,  1,  5,  1]),
 array([ 8,  8,  3,  2,  4, 12,  3,  3,  2,  4]),
 array([5, 4, 1, 5, 8, 0, 4, 3, 5, 1]),
 array([ 3,  7,  7,  6, 12, 12, 10,  4,  2,  9]),
 array([4, 0, 3, 2, 5, 1, 3, 4, 0, 7]),
 array([6, 8, 4, 6, 2, 7, 4, 4, 7, 7]),
 array([3, 7, 3, 4, 9, 4, 6, 5, 3, 3]),
 array([0, 3, 0, 0, 2, 1, 1, 0, 1, 0]),
 array([0, 0, 2, 0, 1, 1, 0, 1, 0, 3]),
 array([4, 7, 7, 7, 7, 7, 2, 7, 8, 7]),
 array([11, 15, 11, 10,  7,  4,  5,  9, 14, 10]),
 array([10,  7,  9,  8,  7,  9,  8, 13,  8,  8]),
 array([7, 4, 4, 6, 9, 5, 6, 5, 8, 6])]

cc[0])中的第一个数据帧是:

   0  1   2   3  4
0  5  3   1   8  5
1  3  4   6   3  0
2  0  4  11  10  7