我有一个以下格式的数据集。它有48列和大约200000行。
slot1,slot2,slot3,slot4,slot5,slot6...,slot45,slot46,slot47,slot48
1,2,3,4,5,6,7,......,45,46,47,48
3.5,5.2,2,5.6,...............
我想将此数据集重塑为以下内容,其中N小于48(也许是24或12等)列标题无关紧要。 当N = 4时
slotNew1,slotNew2,slotNew3,slotNew4
1,2,3,4
5,6,7,8
......
45,46,47,48
3.5,5.2,2,5.6
............
我可以逐行读取内容,然后拆分每一行并附加到新的数据框中。但这是非常低效的。有什么有效,更快的方法吗?
答案 0 :(得分:1)
您可以尝试
$takeOff = '23:00';
$landing = '01:10';
$t1 = strtotime($takeOff);
$t2 = strtotime($landing);
$diff = gmdate('H:i', $t2 - $t1);
dd($diff); // "02:10"
代码将数据提取到N = 4
df_new = pd.DataFrame(df_original.values.reshape(-1, N))
df_new.columns = ['slotNew{:}'.format(i + 1) for i in range(N)]
中,对其进行整形,并创建一个具有所需尺寸的新数据集。
示例:
numpy.ndarray
另一种方法
import numpy as np
import pandas as pd
df0 = pd.DataFrame(np.arange(48 * 3).reshape(-1, 48))
df0.columns = ['slot{:}'.format(i + 1) for i in range(48)]
print(df0)
# slot1 slot2 slot3 slot4 ... slot45 slot46 slot47 slot48
# 0 0 1 2 3 ... 44 45 46 47
# 1 48 49 50 51 ... 92 93 94 95
# 2 96 97 98 99 ... 140 141 142 143
#
# [3 rows x 48 columns]
N = 4
df = pd.DataFrame(df0.values.reshape(-1, N))
df.columns = ['slotNew{:}'.format(i + 1) for i in range(N)]
print(df.head())
# slotNew1 slotNew2 slotNew3 slotNew4
# 0 0 1 2 3
# 1 4 5 6 7
# 2 8 9 10 11
# 3 12 13 14 15
# 4 16 17 18 19
答案 1 :(得分:1)
制作块后使用pandas.explode
。给定df
:
import pandas as pd
df = pd.DataFrame([np.arange(1, 49)], columns=['slot%s' % i for i in range(1, 49)])
print(df)
slot1 slot2 slot3 slot4 slot5 slot6 slot7 slot8 slot9 slot10 ... \
0 1 2 3 4 5 6 7 8 9 10 ...
slot39 slot40 slot41 slot42 slot43 slot44 slot45 slot46 slot47 \
0 39 40 41 42 43 44 45 46 47
slot48
0 48
使用chunks
进行除法:
def chunks(l, n):
"""Yield successive n-sized chunks from l.
Source: https://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenly-sized-chunks
"""
n_items = len(l)
if n_items % n:
n_pads = n - n_items % n
else:
n_pads = 0
l = l + [np.nan for _ in range(n_pads)]
for i in range(0, len(l), n):
yield l[i:i + n]
N = 4
new_df = pd.DataFrame(list(df.apply(lambda x: list(chunks(list(x), N)), 1).explode()))
print(new_df)
输出:
0 1 2 3
0 1 2 3 4
1 5 6 7 8
2 9 10 11 12
3 13 14 15 16
4 17 18 19 20
...
此方法在numpy.reshape
上的优势在于它可以处理N
不受影响的情况:
N = 7
new_df = pd.DataFrame(list(df.apply(lambda x: list(chunks(list(x), N)), 1).explode()))
print(new_df)
输出:
0 1 2 3 4 5 6
0 1 2 3 4 5 6 7.0
1 8 9 10 11 12 13 14.0
2 15 16 17 18 19 20 21.0
3 22 23 24 25 26 27 28.0
4 29 30 31 32 33 34 35.0
5 36 37 38 39 40 41 42.0
6 43 44 45 46 47 48 NaN