我有一个数据框,该数据框是通过合并多个MATLAB .mat
文件,然后将字典的合并列表加载到熊猫而创建的。
KEY_COLUMN VALUE_COLUMN
0 [[[KEY1]], [[KEY2]], [[KEY3]], [[KEY4]]] [[VALUE], [VALUE], [VALUE], [VALUE]]
1 [[[KEY2]], [[KEY3]], [[KEY1]], [[KEY4]]] [[VALUE], [VALUE], [VALUE], [VALUE]]
2 [[[KEY1]], [[KEY3]], [[KEY4]], [[KEY2]]] [[VALUE], [VALUE], [VALUE], [VALUE]]
{'TYPE': {0: array([[array(['START'], dtype='<U5')],
[array(['DIST'], dtype='<U6')],
[array(['DISTFALSE'], dtype='<U7')],
[array(['DISTTRUE'], dtype='<U7')],
[array(['ENCFALSE'], dtype='<U11')],
[array(['ENCTRUE'], dtype='<U12')]], dtype=object),
1: array([[array(['DISTFALSE'], dtype='<U5')],
[array(['START'], dtype='<U10')],
[array(['DIST'], dtype='<U11')],
[array(['DISTTRUE'], dtype='<U11')],
[array(['ENCTRUE'], dtype='<U10')],
[array(['ENCFALSE'], dtype='<U11')]], dtype=object)},
'TIME': {0: array([[ 24413],
[ 27481],
[ 29382],
[ 31923],
[ 31249],
[ 34690]]),
1: array([[ 364582],
[ 31234],
[ 43123],
[ 24444],
[ 55551],
[ 12355]])}}
现在,我希望将KEYS作为数据框的列,将VALUES作为数据框的行,如下所示:
KEY1 KEY2 KEY3 KEY4
0 VALUE VALUE VALUE VALUE
1 VALUE VALUE VALUE VALUE
2 VALUE VALUE VALUE VALUE
问题在于键(和连续值)的顺序不同。当前行之间有所不同。
如何实现? 非常感谢!
答案 0 :(得分:0)
我使用以下方法解决此问题:
df = pd.DataFrame({'TYPE': {0: np.array([[np.array(['START'], dtype='<U5')],[np.array(['DIST'], dtype='<U6')],[np.array(['DISTFALSE'], dtype='<U7')],[np.array(['DISTTRUE'], dtype='<U7')],[np.array(['ENCFALSE'], dtype='<U11')],[np.array(['ENCTRUE'], dtype='<U12')]], dtype=object),
1: np.array([[np.array(['DISTFALSE'], dtype='<U5')],[np.array(['START'], dtype='<U10')],[np.array(['DIST'], dtype='<U11')],[np.array(['DISTTRUE'], dtype='<U11')],[np.array(['ENCTRUE'], dtype='<U10')],[np.array(['ENCFALSE'], dtype='<U11')]], dtype=object)},
'TIME': {0: np.array([[ 24413],[ 27481],[ 29382],[ 31923],[ 31249],[ 34690]]),
1: np.array([[ 364582],[ 31234],[ 43123],[ 24444],[ 55551],[ 12355]])}})
# Assuming a df as shown in the problem statement
#Initialize an empty dictionary to hold extracted keys and values
keyvals = {}
for i in range(0, df.shape[0]):
keyrow = df.iloc[i, 0].flatten()
valrow = df.iloc[i, 1].flatten()
for j,k in zip(keyrow, valrow):
try:
keyvals[j].append(k)
except:
keyvals[j] = []
keyvals[j].append(k)
finally:
pass
finDf = pd.DataFrame(dict([(k,pd.Series(v)) for k,v in keyvals.items()]))
finDf最终采用以下形式:
DISTF START DIST DISTTRUE ENCTRUE ENCFALSE DISTFAL DISTTRU
0 364582.0 24413 27481 24444.0 34690 31249 29382.0 31923.0
1 NaN 31234 43123 NaN 55551 12355 NaN NaN
答案 1 :(得分:0)
让我们通过在列表推导内映射键值对并使用np.squeeze
来删除单个维度来创建新的数据框:
df1 = pd.DataFrame([dict(zip(*map(np.squeeze, v))) for v in df.to_numpy()])
结果:
# for sample data
KEY1 KEY2 KEY3 KEY4
0 VALUE VALUE VALUE VALUE
1 VALUE VALUE VALUE VALUE
2 VALUE VALUE VALUE VALUE
# for actual data
START DIST DISTFAL DISTTRU ENCFALSE ENCTRUE DISTF DISTTRUE
0 24413 27481 29382.0 31923.0 31249 34690 NaN NaN
1 31234 43123 NaN NaN 12355 55551 364582.0 24444.0