假设我有数据帧:
A B C
1 2 3
4 5 6
7 8 9
我想将行转换为单个列:
Column
1
2
3
4
5
6
7
8
9
答案 0 :(得分:10)
将DataFrame
转换为numpy array
,并将ravel
用于展平的1d数组:
df = pd.DataFrame(df.values.ravel(), columns=['Column'])
#alternative
#df = pd.DataFrame({'Column': df.values.ravel()})
print (df)
Column
0 1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
8 9
感谢您提供另一种解决方案,user32185 :
pd.DataFrame({'Column': df.values.reshape(-1)})
<强>计时强>:
N = 100000
df = pd.DataFrame(np.random.randint(20, size=(N, 3)), columns=list('ABC'))
#print (df)
In [201]: %timeit pd.DataFrame({'Column': df.values.ravel()})
1000 loops, best of 3: 825 µs per loop
In [202]: %timeit pd.DataFrame(df.values.ravel(), columns=['Column'])
10000 loops, best of 3: 144 µs per loop
In [203]: %timeit pd.DataFrame({'Column': df.values.reshape(-1)})
1000 loops, best of 3: 778 µs per loop
In [204]: %timeit pd.DataFrame(df.values.reshape(-1), columns=['Column'])
10000 loops, best of 3: 143 µs per loop
In [205]: %timeit pd.DataFrame(df.values.flatten(), columns=['Column'])
1000 loops, best of 3: 585 µs per loop
#solutions of Wen
In [224]: %timeit pd.DataFrame({'Column':np.concatenate(df.values)})
10 loops, best of 3: 45.6 ms per loop
In [225]: %timeit df.stack().reset_index(level=1,drop=True)
100 loops, best of 3: 7.65 ms per loop
In [226]: %timeit df.T.melt().drop('variable',1)
100 loops, best of 3: 6.94 ms per loop
N = 10000000
df = pd.DataFrame(np.random.randint(20, size=(N, 3)), columns=list('ABC'))
print (df)
In [209]: %timeit pd.DataFrame({'Column': df.values.ravel()})
10 loops, best of 3: 51.3 ms per loop
In [210]: %timeit pd.DataFrame(df.values.ravel(), columns=['Column'])
10000 loops, best of 3: 143 µs per loop
In [211]: %timeit pd.DataFrame({'Column': df.values.reshape(-1)})
10 loops, best of 3: 53.4 ms per loop
In [212]: %timeit pd.DataFrame(df.values.reshape(-1), columns=['Column'])
10000 loops, best of 3: 147 µs per loop
In [213]: %timeit pd.DataFrame(df.values.flatten(), columns=['Column'])
10 loops, best of 3: 50.8 ms per loop
#solutions of Wen
In [220]: %timeit pd.DataFrame({'Column':np.concatenate(df.values)})
1 loop, best of 3: 4.62 s per loop
In [221]: %timeit df.stack().reset_index(level=1,drop=True)
1 loop, best of 3: 788 ms per loop
In [222]: %timeit df.T.melt().drop('variable',1)
1 loop, best of 3: 682 ms per loop
答案 1 :(得分:2)
Op1 np.concatenate
pd.DataFrame({'Column':np.concatenate(df.values)})
Out[278]:
Column
0 1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
8 9
Op2 Stack
df.stack().reset_index(level=1,drop=True)
Out[280]:
0 1
0 2
0 3
1 4
1 5
1 6
2 7
2 8
2 9
dtype: int64
Op3 melt
df.T.melt().drop('variable',1)
Out[283]:
value
0 1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
8 9