我是pandas模块的新手,并在工作中使用它进行数据分析。我有一个excel表,每天从访问数据库导入数据,每次机器关闭时都会插入新记录。该表基本上显示了每台机器的正常运行时间百分比
ID | Area | Machine | Week | UTPercent
--------------------------------------
1 | A1 | M1 | 1 | 80
2 | A1 | M1 | 4 | 90
3 | A2 | M2 | 4 | 70
4 | A2 | M2 | 8 | 82
从上面可以看出,如果当前周是8,那么它已经超过了Machine1的2,3,5,6,7,8周和Machine2的1,2,3,5,6和7周。如何在中间添加行并将UTPercent相应地设置为所有这些行的100%?换句话说,这就是我需要的。
ID | Area | Machine | Week | UTPercent
--------------------------------------
1 | A1 | M1 | 1 | 80
2 | A1 | M1 | 2 | 100
3 | A1 | M1 | 3 | 100
4 | A1 | M1 | 4 | 90
5 | A1 | M1 | 5 | 100
6 | A1 | M1 | 6 | 100
7 | A1 | M1 | 7 | 100
8 | A1 | M1 | 8 | 100
9 | A1 | M2 | 1 | 100
10 | A2 | M2 | 2 | 100
11 | A2 | M2 | 3 | 100
12 | A2 | M2 | 4 | 70
13 | A2 | M2 | 5 | 100
14 | A2 | M2 | 6 | 100
15 | A2 | M2 | 7 | 100
16 | A2 | M2 | 8 | 82
另外,如果在Area1中只对Machine1进行条形图绘制,如何添加数据标签?我制作了一周(x轴)与正常运行时间百分比(y轴)的条形图。我将需要Weeks作为我的数据标签。
这是我到目前为止所做的:
import matplotlib.plot as plt
import pandas as pd
df = pd.read_excel("targetFolder.xlsx", sheetname = 0, sep ='|')
area1 = df.loc[df['Area'] == 'A1']
# the data
data = list(area1['UTPercent'])
weekNum = list(df.Week)
## the bars
fig = plt.figure()
ax1 = fig.add_subplot(111)
plotData = ax1.bar(weekNum, data, width = 0.45,
color='#556B2F')
# adding labels and title
ax1.set_xlabel("Weeks")
ax1.set_ylabel("Uptime Percentage")
ax1.set_title("Metrology Area", weight='bold')
fig.tight_layout()
fig.gca()
答案 0 :(得分:0)
对于第一个问题,我会做这样的事情(假设你的表名为uptimes
):
INSERT INTO uptimes (Week, Machine, Area, UTPercent)
(SELECT SeqValue AS Week,
machines.Machine,
machines.Area,
100 AS UTPercent
FROM
(SELECT (TWO_1.SeqValue + TWO_2.SeqValue + TWO_4.SeqValue + TWO_8.SeqValue + TWO_16.SeqValue + TWO_32.SeqValue) SeqValue
FROM
(SELECT 0 SeqValue
UNION ALL SELECT 1 SeqValue) TWO_1
CROSS JOIN
(SELECT 0 SeqValue
UNION ALL SELECT 2 SeqValue) TWO_2
CROSS JOIN
(SELECT 0 SeqValue
UNION ALL SELECT 4 SeqValue) TWO_4
CROSS JOIN
(SELECT 0 SeqValue
UNION ALL SELECT 8 SeqValue) TWO_8
CROSS JOIN
(SELECT 0 SeqValue
UNION ALL SELECT 16 SeqValue) TWO_16
CROSS JOIN
(SELECT 0 SeqValue
UNION ALL SELECT 32 SeqValue) TWO_32
HAVING SeqValue <=
(SELECT max(week)
FROM uptimes)
AND SeqValue > 0) AS integers
LEFT JOIN
(SELECT Machine,
Area
FROM uptimes
GROUP BY 1,
2) AS machines ON 1=1
LEFT JOIN uptimes ON uptimes.week = integers.SeqValue
AND machines.Machine = uptimes.Machine
WHERE uptimes.week IS NULL);
它的工作方式:
对于另一个问题。尝试使用pandas plot功能。
df = pd.read_excel("targetFolder.xlsx", sheetname = 0, sep ='|')
area1 = df[df.Area == 'A1']
area1.set_index('Week')['UTPercent'].plot(kind='bar')