我从一个软件工具输出了一个excel表格,该表格以以下多标题方式构造。 excel结构:
librdf_uri
csv结构:
A
我正在寻找一种快速的方法来将这些凌乱的数据转换为整齐的熊猫数据框。
最终结果应如下所示。
+---+-------+--------------+--------------+
| | | | |
+---+-------+--------------+--------------+
| | | not relevant | not relevant |
+---+-------+--------------+--------------+
| | | X1 | Y1 |
+---+-------+--------------+--------------+
|fr | Time | not relevant | not relevant |
+---+-------+--------------+--------------+
| 1 | 0.000 | 12 | 32 |
+---+-------+--------------+--------------+
| 2 | 0.010 | 23 | 3 |
+---+-------+--------------+--------------+
| 3 | 0.020 | 45 | 4 |
+---+-------+--------------+--------------+
| 4 | 0.030 | 4 | 1 |
+---+-------+--------------+--------------+
| | | | |
+---+-------+--------------+--------------+
| | | not relevant | |
+---+-------+--------------+--------------+
| | | Y2 | |
+---+-------+--------------+--------------+
|fr | Time | not relevant | |
+---+-------+--------------+--------------+
| 1 | 0.000 | 5 | |
+---+-------+--------------+--------------+
| 2 | 0.010 | 89 | |
+---+-------+--------------+--------------+
| 3 | 0.020 | 5 | |
+---+-------+--------------+--------------+
| 4 | 0.030 | 3 | |
+---+-------+--------------+--------------+
| | | | |
+---+-------+--------------+--------------+
| | | not relevant | |
+---+-------+--------------+--------------+
| | | X3 | |
+---+-------+--------------+--------------+
|fr | Time | not relevant | |
+---+-------+--------------+--------------+
| 1 | 0.000 | 17 | |
+---+-------+--------------+--------------+
| 2 | 0.010 | 2 | |
+---+-------+--------------+--------------+
| 3 | 0.020 | 4 | |
+---+-------+--------------+--------------+
| 4 | 0.030 | 23 | |
+---+-------+--------------+--------------+
答案 0 :(得分:0)
我做了以下事情……对此并不感到很高兴,但是它可行。
import numpy as np
import pandas as pd
filename = 'test_data'
df = pd.read_excel(filename + '.xlsx', header=None)
df_list = np.split(df, df[df.isnull().all(1)].index)
del df_list[0]
for i, df in enumerate(df_list):
df.iloc[3, 2:] = df.iloc[2, 2:]
new_header = df.iloc[3]
df.columns = new_header
df = df.iloc[4:]
df_tmp = df.drop(['Frame'], axis=1)
df = df_tmp.set_index("Time")
df.dropna(axis=1, how='all', inplace=True)
df.columns.name = None
df_list[i] = df
df = pd.concat(df_list, axis=1)
df = df.reindex(sorted(df.columns), axis=1)
df.to_csv(filename + '.csv')