目前我正在开发一个项目来读取多个Excel工作表,解析数据,并将组合的数据帧保存回excel文件。
我将多个excel表读取到数据帧,将每个转换为嵌套的dict,然后将all组合成一个大的嵌套dict(三维数据),最后转换回dataframe。
我已经完成了将数据组合到一个大数据帧。但是,数据帧的每个值都是一个dict,因此在使用to_excel保存到excel之后,格式不好(如下所示)。我想要的是重新组织数据,因此每个excel单元格只包含一个键,一个索引项或一个值。
以下是我用于将数据保存到Excel的代码:
with ExcelWriter(dump_excel) as writer:
DataFrame(DataFrame(new_dict)).to_excel(writer, sheet_name='yield_all')
答案 0 :(得分:3)
IIUC您可以使用dict
构造函数和DataFrame
理解删除list
:
print (df)
BIN2_R1
0 {0.0: 23, 1.0: 31, 'yield': '13.01%', 'total':...
1 {0.0: 81, 1.0: 70, 'yield': '36.01%', 'total':...
print (pd.DataFrame([x for x in df.BIN2_R1]))
0.0 1.0 yield total
0 23 31 13.01% 54
1 81 70 36.01% 151
编辑:
您可以使用concat
:
df1 = pd.concat([(pd.DataFrame([x for x in df.BIN2_R1])),
(pd.DataFrame([x for x in df.FT]))], axis=1, keys=['BIN2_R1','FT'])
print (df1)
BIN2_R1 FT
0.0 1.0 yield total 0.0 1.0 yield total
0 23 31 13.01% 54 82 6517 92.70% 6599
1 81 70 36.01% 151 51 173 0.53% 13
df1.to_excel('test.xlsx')
更通用的解决方案,如果所有列都包含dictionaries
:
dfs = [pd.DataFrame([x for x in df[col]]) for col in df.columns]
df1 = pd.concat(dfs, axis=1, keys=df.columns)
print (df1)
BIN2_R1 FT
0.0 1.0 yield total 0.0 1.0 yield total
0 23 31 13.01% 54 82 6517 92.70% 6599
1 81 70 36.01% 151 51 173 0.53% 13
df1.to_excel('test.xlsx')
编辑:
主要问题是内部dict
不是dict
,而是string
。
所以我必须转换它们。使用NaN
无法转换,因此我{}
import pandas as pd
import ast
d = {1: {u'FT1': u"{0.0: 19732, 1.0: 20495, 'total': 40227, 'yield': '93.34%'}", u'FT3': u"{0.0: 9285, 1.0: 9629, 'total': 18914, 'yield': '92.93%'}", u'FT2': u"{0.0: 1412, 1.0: 1480, 'total': 2892, 'yield': '93.87%'}", u'FT': u"{0.0: 82, 1.0: 6517, 'total': 6599, 'yield': '92.70%'}", u'FT_R1': u"{0.0: 1262, 1.0: 1418, 'total': 2680, 'yield': '53.73%'}", u'QA_R2': u"{0.0: 2, 'total': 2, 'yield': '100.00%'}", u'QA_R1': u"{0.0: 6, 'total': 6, 'yield': '75.00%'}", u'QA': u"{1.0: 750, 'total': 750, 'yield': '98.94%'}", u'BIN2_R1': u"{0.0: 23, 1.0: 31, 'total': 54, 'yield': '13.01%'}"}, 2: {u'FT1': u"{0.0: 246, 1.0: 110, 'total': 356, 'yield': '0.83%'}", u'FT3': u"{0.0: 81, 1.0: 54, 'total': 135, 'yield': '0.66%'}", u'FT2': u"{0.0: 9, 1.0: 3, 'total': 12, 'yield': '0.39%'}", u'FT': u"{0.0: 51, 1.0: 173, 'total': 224, 'yield': '3.15%'}", u'FT_R1': u"{0.0: 138, 1.0: 86, 'total': 224, 'yield': '4.49%'}", u'QA_R1': u"{0.0: 1, 'total': 1, 'yield': '12.50%'}", u'QA': u"{1.0: 5, 'total': 5, 'yield': '0.66%'}", u'BIN2_R1': u"{0.0: 81, 1.0: 70, 'total': 151, 'yield': '36.39%'}"}, 3: {u'FT1': u"{0.0: 72, 1.0: 47, 'total': 119, 'yield': '0.28%'}", u'FT3': u"{0.0: 35, 1.0: 25, 'total': 60, 'yield': '0.29%'}", u'FT2': u"{0.0: 1, 1.0: 1, 'total': 2, 'yield': '0.06%'}", u'FT': u"{0.0: 0, 1.0: 13, 'total': 13, 'yield': '0.18%'}", u'FT_R1': u"{0.0: 93, 1.0: 98, 'total': 191, 'yield': '3.83%'}", u'BIN2_R1': u"{0.0: 92, 1.0: 97, 'total': 189, 'yield': '45.54%'}"}, 4: {u'FT1': u"{0.0: 132, 1.0: 174, 'total': 306, 'yield': '0.71%'}", u'FT3': u"{0.0: 35, 1.0: 36, 'total': 71, 'yield': '0.35%'}", u'FT2': u"{0.0: 8, 1.0: 11, 'total': 19, 'yield': '0.62%'}", u'FT': u"{0.0: 1, 1.0: 37, 'total': 38, 'yield': '0.53%'}", u'FT_R1': u"{0.0: 179, 1.0: 167, 'total': 346, 'yield': '6.94%'}", u'BIN2_R1': u"{0.0: 1, 1.0: 1, 'total': 2, 'yield': '0.48%'}"}, 5: {u'FT1': u"{0.0: 27, 1.0: 28, 'total': 55, 'yield': '0.13%'}", u'FT_R1': u"{0.0: 41, 1.0: 34, 'total': 75, 'yield': '1.50%'}", u'FT3': u"{0.0: 11, 1.0: 9, 'total': 20, 'yield': '0.10%'}", u'FT2': u"{0.0: 2, 1.0: 0, 'total': 2, 'yield': '0.06%'}", u'FT': u"{0.0: 0, 1.0: 4, 'total': 4, 'yield': '0.06%'}"}, 8: {u'FT1': u"{0.0: 76, 1.0: 77, 'total': 153, 'yield': '0.35%'}", u'FT3': u"{0.0: 40, 1.0: 42, 'total': 82, 'yield': '0.40%'}", u'FT2': u"{0.0: 5, 1.0: 8, 'total': 13, 'yield': '0.42%'}", u'FT': u"{0.0: 0, 1.0: 20, 'total': 20, 'yield': '0.28%'}", u'FT_R1': u"{0.0: 131, 1.0: 133, 'total': 264, 'yield': '5.29%'}", u'BIN2_R1': u"{0.0: 1, 1.0: 2, 'total': 3, 'yield': '0.72%'}"}, 9: {u'FT1': u"{0.0: 199, 1.0: 158, 'total': 357, 'yield': '0.83%'}", u'FT3': u"{0.0: 90, 1.0: 62, 'total': 152, 'yield': '0.75%'}", u'FT2': u"{0.0: 8, 1.0: 8, 'total': 16, 'yield': '0.52%'}", u'FT': u"{0.0: 0, 1.0: 36, 'total': 36, 'yield': '0.51%'}", u'FT_R1': u"{0.0: 238, 1.0: 238, 'total': 476, 'yield': '9.54%'}", u'BIN2_R1': u"{0.0: 2, 1.0: 2, 'total': 4, 'yield': '0.96%'}"}, 10: {u'FT1': u"{0.0: 56, 1.0: 38, 'total': 94, 'yield': '0.22%'}", u'FT3': u"{0.0: 25, 1.0: 33, 'total': 58, 'yield': '0.28%'}", u'FT2': u"{0.0: 5, 1.0: 1, 'total': 6, 'yield': '0.19%'}", u'FT': u"{0.0: 0, 1.0: 11, 'total': 11, 'yield': '0.15%'}", u'FT_R1': u"{0.0: 77, 1.0: 66, 'total': 143, 'yield': '2.87%'}", u'BIN2_R1': u"{0.0: 2, 1.0: 0, 'total': 2, 'yield': '0.48%'}"}, 11: {u'FT1': u"{0.0: 2, 1.0: 0, 'total': 2, 'yield': '0.00%'}", u'FT3': u"{0.0: 1, 1.0: 0, 'total': 1, 'yield': '0.00%'}", u'BIN2_R1': u"{0.0: 1, 1.0: 0, 'total': 1, 'yield': '0.24%'}"}, 12: {u'FT1': u"{0.0: 6, 1.0: 0, 'total': 6, 'yield': '0.01%'}", u'FT3': u"{0.0: 2, 1.0: 0, 'total': 2, 'yield': '0.01%'}", u'FT_R1': u"{0.0: 1, 1.0: 0, 'total': 1, 'yield': '0.02%'}"}, 13: {u'FT1': u"{0.0: 953, 1.0: 422, 'total': 1375, 'yield': '3.19%'}", u'FT3': u"{0.0: 544, 1.0: 292, 'total': 836, 'yield': '4.11%'}", u'FT2': u"{0.0: 88, 1.0: 28, 'total': 116, 'yield': '3.77%'}", u'FT': u"{0.0: 21, 1.0: 147, 'total': 168, 'yield': '2.36%'}", u'FT_R1': u"{0.0: 289, 1.0: 225, 'total': 514, 'yield': '10.30%'}", u'QA_R1': u"{0.0: 1, 'total': 1, 'yield': '12.50%'}", u'QA': u"{1.0: 3, 'total': 3, 'yield': '0.40%'}", u'BIN2_R1': u"{0.0: 4, 1.0: 5, 'total': 9, 'yield': '2.17%'}"}, 14: {u'FT1': u"{0.0: 31, 1.0: 18, 'total': 49, 'yield': '0.11%'}", u'FT_R1': u"{0.0: 35, 1.0: 39, 'total': 74, 'yield': '1.48%'}", u'FT3': u"{0.0: 16, 1.0: 7, 'total': 23, 'yield': '0.11%'}", u'FT2': u"{0.0: 2, 1.0: 1, 'total': 3, 'yield': '0.10%'}", u'FT': u"{0.0: 0, 1.0: 6, 'total': 6, 'yield': '0.08%'}"}}
df = pd.DataFrame.from_dict(d, orient='index')
#print (df)
df = df.fillna('{}')
for col in df.columns:
df[col] = df[col].map(lambda d : ast.literal_eval(d))
#print (df)
dfs = [pd.DataFrame([x for x in df[col]], index=df.index) for col in df.columns]
df1 = pd.concat(dfs, axis=1, keys=df.columns)
:
print (df1)
FT FT_R1 QA \
0.0 1.0 total yield 0.0 1.0 total yield 1.0
1 82.0 6517.0 6599.0 92.70% 1262.0 1418.0 2680.0 53.73% 750.0
2 51.0 173.0 224.0 3.15% 138.0 86.0 224.0 4.49% 5.0
3 0.0 13.0 13.0 0.18% 93.0 98.0 191.0 3.83% NaN
4 1.0 37.0 38.0 0.53% 179.0 167.0 346.0 6.94% NaN
5 0.0 4.0 4.0 0.06% 41.0 34.0 75.0 1.50% NaN
8 0.0 20.0 20.0 0.28% 131.0 133.0 264.0 5.29% NaN
9 0.0 36.0 36.0 0.51% 238.0 238.0 476.0 9.54% NaN
10 0.0 11.0 11.0 0.15% 77.0 66.0 143.0 2.87% NaN
11 NaN NaN NaN NaN NaN NaN NaN NaN NaN
12 NaN NaN NaN NaN 1.0 0.0 1.0 0.02% NaN
13 21.0 147.0 168.0 2.36% 289.0 225.0 514.0 10.30% 3.0
14 0.0 6.0 6.0 0.08% 35.0 39.0 74.0 1.48% NaN
... BIN2_R1 FT3 FT1 \
total ... total yield 0.0 1.0 total yield 0.0 1.0
1 750.0 ... 54.0 13.01% 9285 9629 18914 92.93% 19732 20495
2 5.0 ... 151.0 36.39% 81 54 135 0.66% 246 110
3 NaN ... 189.0 45.54% 35 25 60 0.29% 72 47
4 NaN ... 2.0 0.48% 35 36 71 0.35% 132 174
5 NaN ... NaN NaN 11 9 20 0.10% 27 28
8 NaN ... 3.0 0.72% 40 42 82 0.40% 76 77
9 NaN ... 4.0 0.96% 90 62 152 0.75% 199 158
10 NaN ... 2.0 0.48% 25 33 58 0.28% 56 38
11 NaN ... 1.0 0.24% 1 0 1 0.00% 2 0
12 NaN ... NaN NaN 2 0 2 0.01% 6 0
13 3.0 ... 9.0 2.17% 544 292 836 4.11% 953 422
14 NaN ... NaN NaN 16 7 23 0.11% 31 18
total yield
1 40227 93.34%
2 356 0.83%
3 119 0.28%
4 306 0.71%
5 55 0.13%
8 153 0.35%
9 357 0.83%
10 94 0.22%
11 2 0.00%
12 6 0.01%
13 1375 3.19%
14 49 0.11%
private void SaveControlImage(Control ctr)
{
try
{
var imagePath = @"C:\Image.png";
Image bmp = new Bitmap(ctr.Width, ctr.Height);
var gg = Graphics.FromImage(bmp);
var rect = ctr.RectangleToScreen(ctr.ClientRectangle);
gg.CopyFromScreen(rect.Location, Point.Empty, ctr.Size);
bmp.Save(imagePath);
Process.Start(imagePath);
}
catch (Exception)
{
//
}
}
答案 1 :(得分:0)
我找到了一个直接读取excel文件数据作为数据帧并连接成单个数据帧的解决方案,而不是将数据帧转换为嵌套的dict中间。
list_df=list(map(lambda s:pd.read_excel(pd.ExcelFile(s), 'yield', index_col=[0]),std_excel_path))
keys_list=list(map(lambda s:get_name(s),std_excel_path))
combined=pd.concat(list_df,axis=1,keys=keys_list)
combined.fillna(0,inplace=True)
combined.columns.names = ['test', 'info']
combined.index.names = ['soft_bin']
print combined
,结果是具有多索引的组合数据框:
test FT-20160702124027 FT1-20160702134747 \
info 0 1 total yield 0
soft_bin
01:pass 957 954 1911 97.01% 4334
02:os_open_fail 5 5 10 0.51% 8
03:os_short_fail 1 0 1 0.05% 2
04:io_fail 1 2 3 0.15% 8
05:clk_fail 0 0 0 0 3
06:reset_fir_fail 0 0 0 0 0
08:mbist_fail 1 0 1 0.05% 10
09:dc_scan_fail 11 14 25 1.27% 67
10:ac_scan_fail 3 2 5 0.25% 21
11:func_dig_fail 0 0 0 0 0
12:efuse_fail 0 0 0 0 0
13:func_ana_fail 6 6 12 0.61% 32
14:func_idd_fail 0 2 2 0.10% 3
test FT2-20160702183026 ... \
info 1 total yield 0 1 ...
soft_bin ...
01:pass 4345 8679 96.68% 1671 1688 ...
02:os_open_fail 10 18 0.20% 3 3 ...
03:os_short_fail 1 3 0.03% 1 0 ...
04:io_fail 10 18 0.20% 2 2 ...
05:clk_fail 2 5 0.06% 2 1 ...
06:reset_fir_fail 0 0 0 0 0 ...
08:mbist_fail 6 16 0.18% 3 1 ...
09:dc_scan_fail 58 125 1.39% 18 11 ...
10:ac_scan_fail 20 41 0.46% 9 4 ...
11:func_dig_fail 0 0 0 0 0 ...
12:efuse_fail 0 0 0 0 0 ...
13:func_ana_fail 33 65 0.72% 16 14 ...
14:func_idd_fail 4 7 0.08% 0 1 ...