为了给出一些背景知识:通常在实验科学中,我们在不同位置有一组探针,可以及时记录一些变量(温度和速度)。这些探测器可能会在某些时间被中断,因此我们剩下的是重启数据块,这些数据只会在前一个数据集结束时继续。我已经创建了一个示例数据集来显示这个,现在唯一的事情是找到一种优雅的方式来连接收集的数据的“块”:
import pandas as pd
import numpy as np
dimS = 1 # dimension of a scalar variable
dimV = 3 # dimension of a vector variable
Np = 3 # number of measurement locations
Nt = 10 # time samples
def create_single_probe(Nt):
"""
Create fictitious data for a probe
"""
probe = {}
probe["temp"] = np.random.rand(dimS*Nt).reshape(dimS,Nt)
probe["velo"] = np.random.rand(dimV*Nt).reshape(dimV,Nt)
probe["loc"] = np.random.rand(3)
return probe
def create_multiple_probes(Np,Nt):
"""
Gather mutliple probes into dict
"""
probes = []
for i in range(Np):
probes.append(create_single_probe(Nt))
data = {}
data["time"] = range(Nt)
data["probes"] = probes
return data
# Create data that we want to concatenate
restarts = [create_multiple_probes(Np,Nt) for i in range(3)]
# Now we want to concatenate the entries of restarts
def concat_complex_dict(restarts):
# Solution here...
return concat_data
在此示例中,每个重新启动位置的探测位置将随新的重启数据而变化,但实际情况并非如此。
连接后我希望如下:
concat_data["time"]
是一个长度为30的列表或数组
concat_data["probes"]
是一个长度为3的列表(因为有三个探测位置),其中每个条目都是一个字典,以便我们拥有i
条目
concat_data["probes"][i]["temp"]
是数组长度30和
concat_data["probes"][i]["velo"]
是一个3x30的数组长度
我可以制作一个非常费力的解决方案,通过这个解决方案,我可以查看我希望与一系列列表连接的字典中的所有元素,但我想知道是否有更优雅的方式可能使用pandas ......
我希望我想做的事情是有道理的,任何建议都会有所帮助。
答案 0 :(得分:1)
假设我正确地遵循了解释并做出了正确的假设(速度是X,Y,Z格式),这是我对更易读的格式的建议。如果我关闭了,我可以将其更改为更可靠,如果我关闭,我会删除。
dimS = 1 # dimension of a scalar variable
dimV = 3 # dimension of a vector variable
Np = 3 # number of measurement locations
Nt = 10 # time samples
def create_single_probe(Nt):
"""
Create fictitious data for a probe
"""
probe = {}
probe["temp"] = np.random.rand(dimS*Nt).reshape(dimS,Nt)
probe["velo"] = np.random.rand(dimV*Nt).reshape(dimV,Nt)
'''Create dataframe'''
mat = np.concatenate([probe['temp'].reshape(Nt,dimS),probe['velo'].reshape(Nt,dimV)],axis=1)
frame = pd.DataFrame(mat,columns=['temp','VelocityX','VelocityY','VelocityZ'])
probe["loc"] = np.random.rand(3)
'''Add location of probe as comma separated string'''
frame['loc'] = ",".join(map(str,probe['loc']))
return frame
def create_multiple_probes(Np,Nt):
"""
Gather mutliple probes into dict
"""
probes = []
for i in range(Np):
df = create_single_probe(Nt)
'''Set time as a row value for each probe'''
df['time'] = range(Nt)
probes.append(df)
'''Concat into one dataframe'''
data = pd.concat(probes)
return data
print create_multiple_probes(Np,Nt)
答案 1 :(得分:0)
这是我的解决方案应该有效。然而,它非常丑陋,难以追随......
def concat_complex_dict(lst):
cdata = {}
# Concatenate the time data
first = True
for x in lst:
time = x["time"]
if first:
ctime = time
first = False
else:
ctime = np.concatenate((ctime,time))
cdata["time"] = ctime
cdata["probes"] = []
# Concatenate the probe data
nLoc = len(lst[0]["probes"])
for i in range(nLoc):
dic = lst[0]["probes"][i]
cprobe = {}
for key, _ in dic.iteritems():
first = True
for j in range(len(lst)):
if first:
cprobe[key] = np.atleast_2d(lst[j]["probes"][i][key])
first = False
else:
if key == "loc":
break # dont concatenate the location as it doesnt change
cprobe[key] = np.concatenate((cprobe[key],lst[j]["probes"][i][key]),axis=1)
cdata["probes"].append(cprobe)
return cdata