将行转换为单个列的最有效方法

时间:2017-12-15 13:55:46

标签: python pandas dataframe

假设我有数据帧:

A    B    C

1    2    3
4    5    6
7    8    9

我想将行转换为单个列:

Column
1
2
3
4
5
6
7
8
9

2 个答案:

答案 0 :(得分:10)

DataFrame转换为numpy array,并将ravel用于展平的1d数组:

df = pd.DataFrame(df.values.ravel(), columns=['Column'])
#alternative
#df = pd.DataFrame({'Column': df.values.ravel()})
print (df)
   Column
0       1
1       2
2       3
3       4
4       5
5       6
6       7
7       8
8       9

感谢您提供另一种解决方案,user32185

pd.DataFrame({'Column': df.values.reshape(-1)})

<强>计时

N = 100000
df = pd.DataFrame(np.random.randint(20, size=(N, 3)), columns=list('ABC'))
#print (df)


In [201]: %timeit pd.DataFrame({'Column': df.values.ravel()})
1000 loops, best of 3: 825 µs per loop

In [202]: %timeit pd.DataFrame(df.values.ravel(), columns=['Column'])
10000 loops, best of 3: 144 µs per loop

In [203]: %timeit pd.DataFrame({'Column': df.values.reshape(-1)})
1000 loops, best of 3: 778 µs per loop

In [204]: %timeit pd.DataFrame(df.values.reshape(-1), columns=['Column'])
10000 loops, best of 3: 143 µs per loop

In [205]: %timeit pd.DataFrame(df.values.flatten(), columns=['Column'])
1000 loops, best of 3: 585 µs per loop

#solutions of Wen
In [224]: %timeit pd.DataFrame({'Column':np.concatenate(df.values)})
10 loops, best of 3: 45.6 ms per loop

In [225]: %timeit df.stack().reset_index(level=1,drop=True)
100 loops, best of 3: 7.65 ms per loop

In [226]: %timeit df.T.melt().drop('variable',1)
100 loops, best of 3: 6.94 ms per loop
N = 10000000
df = pd.DataFrame(np.random.randint(20, size=(N, 3)), columns=list('ABC'))
print (df)

In [209]: %timeit pd.DataFrame({'Column': df.values.ravel()})
10 loops, best of 3: 51.3 ms per loop

In [210]: %timeit pd.DataFrame(df.values.ravel(), columns=['Column'])
10000 loops, best of 3: 143 µs per loop

In [211]: %timeit pd.DataFrame({'Column': df.values.reshape(-1)})
10 loops, best of 3: 53.4 ms per loop

In [212]: %timeit pd.DataFrame(df.values.reshape(-1), columns=['Column'])
10000 loops, best of 3: 147 µs per loop

In [213]: %timeit pd.DataFrame(df.values.flatten(), columns=['Column'])
10 loops, best of 3: 50.8 ms per loop

#solutions of Wen

In [220]: %timeit pd.DataFrame({'Column':np.concatenate(df.values)})
1 loop, best of 3: 4.62 s per loop

In [221]: %timeit df.stack().reset_index(level=1,drop=True)
1 loop, best of 3: 788 ms per loop

In [222]: %timeit df.T.melt().drop('variable',1)
1 loop, best of 3: 682 ms per loop

答案 1 :(得分:2)

Op1 np.concatenate

pd.DataFrame({'Column':np.concatenate(df.values)})
Out[278]: 
   Column
0       1
1       2
2       3
3       4
4       5
5       6
6       7
7       8
8       9

Op2 Stack

df.stack().reset_index(level=1,drop=True)
Out[280]: 
0    1
0    2
0    3
1    4
1    5
1    6
2    7
2    8
2    9
dtype: int64

Op3 melt

df.T.melt().drop('variable',1)
Out[283]: 
   value
0      1
1      2
2      3
3      4
4      5
5      6
6      7
7      8
8      9