python pandas data-frame - 根据列值重复行

时间:2017-12-10 13:05:28

标签: python python-3.x pandas numpy dataframe

我想复制数据帧的行和#34;这个"根据2列值并将它们保存为名为" newThis"的新数据框:

this = pd.DataFrame(columns=['a','b','c'], index=[1,2,3])
this.a = [1, 2, 0]
this.b = [5, 0, 4]
this.c = [2, 3, 2]

newThis = []

for i in range(len(this)):

    if int(this.iloc[i, 1]) != 0:
        that = np.array([this.iloc[i,:]] * int(this.iloc[i, 1]))
    elif int(this.iloc[i, 1]) == 0:
        that = np.array([this.iloc[i,:]])              

    if int(this.iloc[i, 2]) != 0:
        those = np.array([this.iloc[i,:]] * int(this.iloc[i, 2]))
    elif int(this.iloc[i, 2]) == 0:
        those = np.array([this.iloc[i,:]])

    newThis.append(that)
    newThis.append(those)

我想要一个大的连续行数组,但相反,我得到了这个混乱:

[array([[1, 5, 2],
        [1, 5, 2],
        [1, 5, 2],
        [1, 5, 2],
        [1, 5, 2]], dtype=int64), array([[1, 5, 2],
        [1, 5, 2]], dtype=int64), array([[2, 0, 3]], dtype=int64), array([[2, 0, 3],
        [2, 0, 3],
        [2, 0, 3]], dtype=int64), array([[0, 4, 2],
        [0, 4, 2],
        [0, 4, 2],
        [0, 4, 2]], dtype=int64), array([[0, 4, 2],
        [0, 4, 2]], dtype=int64)]

由于

2 个答案:

答案 0 :(得分:3)

IIUC:

来源DF:

In [213]: this
Out[213]:
   a  b  c
1  1  5  2
2  2  0  3
3  0  4  2

解决方案:

In [211]: newThis = pd.DataFrame(np.repeat(this.values, 
                                           this['b'].replace(0,1).tolist(), 
                                           axis=0),
                                 columns=this.columns)

In [212]: newThis
Out[212]:
   a  b  c
0  1  5  2
1  1  5  2
2  1  5  2
3  1  5  2
4  1  5  2
5  2  0  3
6  0  4  2
7  0  4  2
8  0  4  2
9  0  4  2

答案 1 :(得分:0)

看起来你把np.array与列表相乘会让人感到困惑。

记住:

 [np.int32(1)] * 2 == [np.int32(1), np.int32(1)]

可是:

 np.array([1]) * 2 == np.array([2])

您可能需要更改此内容:

np.array([this.iloc[i,:]] * int(this.iloc[i, 1]))

到此:

np.array([this.iloc[i,:]]) * int(this.iloc[i, 1])