Question

我正在尝试使用以下数据在pandas中构建一个2 x 24的表：

d.iloc[0:2] = [[0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 0L, 0L, 0L], [0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 2L, 2L, 0L, 0L, 0L]]

基本上，第一个子括号表示1月份一天的24小时数据，以及2月份的第二个子括号。我希望以下列方式构建2x24表（没有'L'）：

    1 2 3 4 5 6 7 8 9 10 11 12 ... 24
Jan 0 0 0 0 0 0 0 0 0 1  1  1  ... 0
Feb 0 0 0 0 0 0 0 0 0 1  1  1  ... 0

我发现具有挑战性的是剥离（.strip），拆分数据并将数据复制到新的数据帧结构。我经常在在线数据框架上找到带有12个子括号的原始结构（每月一个）。我包含了d.iloc[0,2]，因为我将使用for循环将该函数应用于第2列中的所有元素。谢谢你的宝贵帮助。

Answer 1

我认为您可以DataFrame.from_records使用str.strip：

import pandas as pd
import numpy as np

a = [['0L', '0L', '0L', '0L', '0L', '0L', '0L', '0L', '0L', '1L', '1L', '1L', '1L', '1L', '0L', '0L', '0L', '1L', '1L', '1L', '1L', '0L', '0L', '0L'], 
     ['0L', '0L', '0L', '0L', '0L', '0L', '0L', '0L', '0L', '1L', '1L', '1L', '1L', '1L', '0L', '0L', '0L', '1L', '1L', '2L', '2L', '0L', '0L', '0L']]

idx = ['Jan','Feb']
df = pd.DataFrame.from_records(a, index=idx).apply(lambda x: x.str.strip('L').astype(int))
print (df)
     0   1   2   3   4   5   6   7   8   9  ...  14  15  16  17  18  19  20  \
Jan   0   0   0   0   0   0   0   0   0   1 ...   0   0   0   1   1   1   1   
Feb   0   0   0   0   0   0   0   0   0   1 ...   0   0   0   1   1   2   2   

     21  22  23  
Jan   0   0   0  
Feb   0   0   0  

[2 rows x 24 columns]

通过dt.strftime生成月份名称的更一般的解决方案：

print (pd.Series(range(1,len(a) + 1)))
0    1
1    2
dtype: int32

idx = pd.to_datetime(pd.Series(range(1,len(a) + 1)), format='%m').dt.strftime('%b')
0    Jan
1    Feb
dtype: object

df = pd.DataFrame.from_records(a, index=idx).apply(lambda x: x.str.strip('L').astype(int))
print (df)
     0   1   2   3   4   5   6   7   8   9  ...  14  15  16  17  18  19  20  \
Jan   0   0   0   0   0   0   0   0   0   1 ...   0   0   0   1   1   1   1   
Feb   0   0   0   0   0   0   0   0   0   1 ...   0   0   0   1   1   2   2   

     21  22  23  
Jan   0   0   0  
Feb   0   0   0

如果首先需要split值：

b = [['0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 0L, 0L, 0L'], 
     ['0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 2L, 2L, 0L, 0L, 0L']]

idx = pd.to_datetime(pd.Series(range(1,len(a) + 1)), format='%m').dt.strftime('%b')

df1 = pd.DataFrame.from_records(b, index=idx)
        .iloc[:,0]
        .str.split(', ', expand=True)
        .replace({'L':''}, regex=True)
        .astype(int)
print (df1)

     0   1   2   3   4   5   6   7   8   9  ...  14  15  16  17  18  19  20  \
Jan   0   0   0   0   0   0   0   0   0   1 ...   0   0   0   1   1   1   1   
Feb   0   0   0   0   0   0   0   0   0   1 ...   0   0   0   1   1   2   2   

     21  22  23  
Jan   0   0   0  
Feb   0   0   0  

[2 rows x 24 columns]

从Pandas中的多个括号字符串中提取数据并创建新表

1 个答案: