用熊猫将行拆分为多行

时间:2019-08-12 01:45:24

标签: python pandas dataframe reshape

我有一个以下格式的数据集。它有48列和大约200000行。

slot1,slot2,slot3,slot4,slot5,slot6...,slot45,slot46,slot47,slot48
1,2,3,4,5,6,7,......,45,46,47,48
3.5,5.2,2,5.6,...............

我想将此数据集重塑为以下内容,其中N小于48(也许是24或12等)列标题无关紧要。 当N = 4时

slotNew1,slotNew2,slotNew3,slotNew4
1,2,3,4
5,6,7,8
......
45,46,47,48
3.5,5.2,2,5.6
............

我可以逐行读取内容,然后拆分每一行并附加到新的数据框中。但这是非常低效的。有什么有效,更快的方法吗?

2 个答案:

答案 0 :(得分:1)

您可以尝试

$takeOff = '23:00';
$landing = '01:10';

$t1 = strtotime($takeOff);
$t2 = strtotime($landing);

$diff = gmdate('H:i', $t2 - $t1);

dd($diff); // "02:10"

代码将数据提取到N = 4 df_new = pd.DataFrame(df_original.values.reshape(-1, N)) df_new.columns = ['slotNew{:}'.format(i + 1) for i in range(N)] 中,对其进行整形,并创建一个具有所需尺寸的新数据集。

示例:

numpy.ndarray

另一种方法

import numpy as np
import pandas as pd

df0 = pd.DataFrame(np.arange(48 * 3).reshape(-1, 48))
df0.columns = ['slot{:}'.format(i + 1) for i in range(48)]
print(df0)
#    slot1  slot2  slot3  slot4   ...    slot45  slot46  slot47  slot48
# 0      0      1      2      3   ...        44      45      46      47
# 1     48     49     50     51   ...        92      93      94      95
# 2     96     97     98     99   ...       140     141     142     143
# 
# [3 rows x 48 columns]

N = 4
df = pd.DataFrame(df0.values.reshape(-1, N))
df.columns = ['slotNew{:}'.format(i + 1) for i in range(N)]
print(df.head())
#    slotNew1  slotNew2  slotNew3  slotNew4
# 0         0         1         2         3
# 1         4         5         6         7
# 2         8         9        10        11
# 3        12        13        14        15
# 4        16        17        18        19

答案 1 :(得分:1)

制作块后使用pandas.explode。给定df

import pandas as pd

df = pd.DataFrame([np.arange(1, 49)], columns=['slot%s' % i for i in range(1, 49)])
print(df)

   slot1  slot2  slot3  slot4  slot5  slot6  slot7  slot8  slot9  slot10  ...  \
0      1      2      3      4      5      6      7      8      9      10  ...   

   slot39  slot40  slot41  slot42  slot43  slot44  slot45  slot46  slot47  \
0      39      40      41      42      43      44      45      46      47   

   slot48  
0      48  

使用chunks进行除法:

def chunks(l, n):
    """Yield successive n-sized chunks from l.
    Source: https://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenly-sized-chunks
    """
    n_items = len(l)
    if n_items % n:
        n_pads = n - n_items % n
    else:
        n_pads = 0
    l = l + [np.nan for _ in range(n_pads)] 
    for i in range(0, len(l), n):
        yield l[i:i + n]

N = 4
new_df = pd.DataFrame(list(df.apply(lambda x: list(chunks(list(x), N)), 1).explode()))
print(new_df)

输出:

     0   1   2   3
0    1   2   3   4
1    5   6   7   8
2    9  10  11  12
3   13  14  15  16
4   17  18  19  20
...

此方法在numpy.reshape上的优势在于它可以处理N不受影响的情况:

N = 7
new_df = pd.DataFrame(list(df.apply(lambda x: list(chunks(list(x), N)), 1).explode()))
print(new_df)

输出:

    0   1   2   3   4   5     6
0   1   2   3   4   5   6   7.0
1   8   9  10  11  12  13  14.0
2  15  16  17  18  19  20  21.0
3  22  23  24  25  26  27  28.0
4  29  30  31  32  33  34  35.0
5  36  37  38  39  40  41  42.0
6  43  44  45  46  47  48   NaN