Question

每个ID每年有一行（2008年至2015年）。对于列Max Temp，Min Temp和Rain，每个单元格都包含与该年中某一天相对应的值数组，即上面的框架

frame3.iloc[0]['Max Temp'][0]是2011年1月1日的价值
frame3.iloc[0]['Max Temp'][364]是2011年12月31日的价值。

我知道这个结构很糟糕，但这是我必须处理的数据。它以这种方式存储在MongoDB中（其中一行等同于Mongo中的文档）。

我想拆分这些嵌套数组，以便每年每个ID不是一行，而是每天每个ID有一行。但是，在拆分数组时，我还想根据当前数组索引创建一个新列来捕获一年中的某一天。然后我会使用这一天加上Year列来创建DatetimeIndex

我在这里搜索了相关的答案，但只找到了this one并没有真正帮助我。

Answer 1

您可以为每个列运行exp.getServerHandler=function (request,response){ if(request.url.startsWith("/static/")){ //passing my custom callback function as an input param fileReadHandler(request,response,function callback(message){ console.log(message); }); } }; function fileReadHandler(request,response,callback){ fs.readFile(request.url.substr(1), function(err,data) { if(err){ response.end("bad request"); response.statusCode=400; //here i want to set message as my callback param callback("failed"); }else{ response.end(data); //here i want to set message as my callback param callback("successful"); } } ); }，然后.apply(pd.Series)并连接结果。

对于一个系列

stack

它的工作原理如下

s = pd.Series([[0, 1], [2, 3, 4]], index=[2011, 2012])

s
Out[103]: 
2011       [0, 1]
2012    [2, 3, 4]
dtype: object

该系列的元素有不同的长度（重要的是因为2012年是闰年）。中间序列（即s.apply(pd.Series).stack() Out[104]: 2011 0 0.0 1 1.0 2012 0 2.0 1 3.0 2 4.0 dtype: float64之前）的stack值后来被删除。

现在，让我们采取框架：

NaN

然后我们可以运行：

a = list(range(14))
b = list(range(20, 34))

df = pd.DataFrame({'ID': [11111, 11111, 11112, 11112],
                   'Year': [2011, 2012, 2011, 2012],
                   'A': [a[:3], a[3:7], a[7:10], a[10:14]],
                   'B': [b[:3], b[3:7], b[7:10], b[10:14]]})

df
Out[108]: 
                  A                 B     ID  Year
0         [0, 1, 2]      [20, 21, 22]  11111  2011
1      [3, 4, 5, 6]  [23, 24, 25, 26]  11111  2012
2         [7, 8, 9]      [27, 28, 29]  11112  2011
3  [10, 11, 12, 13]  [30, 31, 32, 33]  11112  2012

并获得：

# set an index (each column will inherit it)
df2 = df.set_index(['ID', 'Year'])
# the trick
unnested_lst = []
for col in df2.columns:
    unnested_lst.append(df2[col].apply(pd.Series).stack())
result = pd.concat(unnested_lst, axis=1, keys=df2.columns)

其余的（日期时间索引）更不简单。例如：

result
Out[115]: 
                 A     B
ID    Year              
11111 2011 0   0.0  20.0
           1   1.0  21.0
           2   2.0  22.0
      2012 0   3.0  23.0
           1   4.0  24.0
           2   5.0  25.0
           3   6.0  26.0
11112 2011 0   7.0  27.0
           1   8.0  28.0
           2   9.0  29.0
      2012 0  10.0  30.0
           1  11.0  31.0
           2  12.0  32.0
           3  13.0  33.0

从多个行

1 个答案: