我是pandas模块的新手。并对数据操作提出了一个简单的问题:
假设我有一个表格如下:
Tool | WeekNumber | Status | Percentage
-----|------------|--------|------------
M1 | 1 | good | 85
M1 | 4 | bad | 75
M1 | 7 | good | 90
根据状态中的条件,我想添加百分比。
例如:
如果状态为" good",则后续周数的后续行应全部为100,即下一行应为第2周和第3周,100%
如果状态为“错误”,则下周数字的百分比应为0,即第5周和第6周为0。
我对如何处理条件有所了解,但不知道添加行:
import os, re
import pandas as pd
df = pd.read_excel("test.xlsx")
add_rows = []
for elem in df.Status:
if elem == "good":
add_rows.append(100)
if elem == "bad":
add_rows.append(0)
df.Percent = pd.Series(add_rows)
但是,这只根据条件给出了三个值,并更改了特定周数的值。但我想要以下内容:
Tool | WeekNumber | Status | Percentage
-----|------------|--------|------------
M1 | 1 | good | 85
M1 | 2 | good | 100
M1 | 3 | good | 100
M1 | 4 | bad | 75
M1 | 5 | bad | 0
M1 | 6 | bad | 0
M1 | 7 | good | 90
答案 0 :(得分:2)
这是另一个
val = pd.DataFrame({'WeekNumber':np.arange(df['WeekNumber'].min(), df['WeekNumber'].max()+ 1, 1)})
new_df = df.merge(val, on='WeekNumber', how = 'outer').sort_values(by = 'WeekNumber').reset_index(drop = True)
new_df[['Tool', 'Status']] = new_df[['Tool', 'Status']].ffill()
new_df['Percentage'] = np.where((new_df['Status'] == 'good') &
new_df['Percentage'].isnull(), 100, new_df['Percentage'])
new_df['Percentage'] = new_df['Percentage'].fillna(0)
你得到了
Tool WeekNumber Status Percentage
0 M1 1 good 85.0
1 M1 2 good 100.0
2 M1 3 good 100.0
3 M1 4 bad 75.0
4 M1 5 bad 0.0
5 M1 6 bad 0.0
6 M1 7 good 90.0
答案 1 :(得分:0)
您可以使用.iterrows()
遍历每一行。
for index, row in df.iterrows():
print row.Status
>>> good
>>> bad
>>> good
如果我需要使用一些粗略的代码,我会使用我的代码:
new_index = 0
new_dict = {}
for index, row in df.iterrows():
use_index = index + new_index
new_row[use_index] = {}
new_row[use_index]= {
'Tool': row.Tool,
'WeekNumber': row.WeekNumber,
'Status': row.Status,
'Percentage': row.Percentage,
}
if row.Percentage == 100:
for n in range(2):
add_index = index + 1 + new_index
new_dict[add_index] = {}
new_row[add_index]= {
'Tool': 'M1',
'WeekNumber': row.WeekNumber + n,
'Status': 'good',
'Percentage': 100,
}
new_index += 1
df = pd.DataFrame(new_dict)
答案 2 :(得分:0)
你的答案是这样的:
add_rows = []
for index, elem in enumerate(df.Status):
if elem == "good":
# assuming data is sorted by 'WeekNumber'
add_rows.append({'Tool': 'M1', 'WeekNumber': index + 2}) # etc
add_rows.append({'Tool': 'M1', 'WeekNumber': index + 3}) # etc
more_data = pd.DataFrame(add_rows)
df = pd.concat([df, more_data]).sort_values(by='WeekNumber')
答案 3 :(得分:0)
试试这个?
df=df.set_index('WeekNumber').reindex(range(1,8))
df.Tool.fillna('M1',inplace=True)
df.Status=df.Status.ffill()
df.Percentage.fillna(0,inplace=True)
df.Percentage=np.where((df.Status=='good')&(df.Percentage==0),100,df.Percentage)
df.reset_index()
Out[80]:
WeekNumber Tool Status Percentage
0 1 M1 good 85.0
1 2 M1 good 100.0
2 3 M1 good 100.0
3 4 M1 bad 75.0
4 5 M1 bad 0.0
答案 4 :(得分:0)
您可以先使用set_index
和reindex
扩展数据框,并填写NaN
和Tool
中的Status
In [814]: dff = (df.set_index('WeekNumber')
.reindex(range(df.WeekNumber.min(), df.WeekNumber.max()+1))
.assign(Tool=lambda x: x.Tool.ffill(),
Status=lambda x: x.Status.ffill()))
In [815]: dff
Out[815]:
Tool Status Percentage
WeekNumber
1 M1 good 85.0
2 M1 good NaN
3 M1 good NaN
4 M1 bad 75.0
5 M1 bad NaN
6 M1 bad NaN
7 M1 good 90.0
然后,有条件地填写Percentage
值
In [816]: dff.loc[(dff.Status == 'good') & dff.Percentage.isnull(), 'Percentage'] = 100
In [817]: dff.loc[(dff.Status == 'bad') & dff.Percentage.isnull(), 'Percentage'] = 0
最后使用reset_index()
In [818]: dff.reset_index()
Out[818]:
WeekNumber Tool Status Percentage
0 1 M1 good 85.0
1 2 M1 good 100.0
2 3 M1 good 100.0
3 4 M1 bad 75.0
4 5 M1 bad 0.0
5 6 M1 bad 0.0
6 7 M1 good 90.0