我是大熊猫新手,我的数据框架如下所示
如何计算特定“ ID”从第一个状态到下一个状态的持续时间,以此类推。
计算有两个以上发生故障且在它们之间至少进行一次维护的ID。
使用“失败-失败”模式和“失败-维护”子集数据。
我尝试了所有组合,例如pandas groupby函数
df.groupby(['ID', 'Status' ]).size().reset_index(name='counts').sort_values(['counts'], ascending =False)
使用以下字典创建DF
import pandas as pd
import numpy as np
sales = [ {'ID': '1', 'Status': 'Failure', 'Date': '2017-04-26'},
{'ID': '2', 'Status': 'Failure', 'Date': '2017-05-06'},
{'ID': '1', 'Status': 'Maintenance', 'Date': '2017-05-16'},
{'ID': '1', 'Status': 'Failure', 'Date': '2017-07-06'},
{'ID': '2', 'Status': 'Failure', 'Date': '2017-09-06'},
{'ID': '1', 'Status': 'Failure', 'Date': '2018-01-14'},
{'ID': '3', 'Status': 'Maintenance', 'Date': '2017-07-16'},
{'ID': '4', 'Status': 'Failure', 'Date': '2017-07-16'},
{'ID': '2', 'Status': 'Maintenance', 'Date': '2018-07-06'},
{'ID': '3', 'Status': 'Failure', 'Date': '2018-01-06'},
{'ID': '3', 'Status': 'Maintenance', 'Date': '2018-07-06'},
{'ID': '3', 'Status': 'Failure', 'Date': '2019-07-06'},
{'ID': '2', 'Status': 'Maintenance', 'Date': '2019-05-06'},
{'ID': '2', 'Status': 'Failure', 'Date': '2019-10-06'},
{'ID': '4', 'Status': 'Maintenance', 'Date': '2019-11-06'}]
df = pd.DataFrame(sales)
df['Date'] = pd.to_datetime(df['Date'])
预期投入
2.1有多个故障的ID。
2.1有多少个ID发生多个故障,并且在它们之间进行了一次维护。以及两次故障之间有两次维护的次数等等。
关于问题3的解释如下 在根据“ ID”和“日期”对数据框进行排序之后,我们获得了以下数据框
Date ID Status
0 2017-04-26 1 F
2 2017-05-16 1 M
3 2017-07-06 1 F
5 2018-01-14 1 F
1 2017-05-06 2 F
4 2017-09-06 2 F
8 2018-07-06 2 M
12 2019-05-06 2 M
13 2019-10-06 2 F
6 2017-07-16 3 M
9 2018-01-06 3 F
10 2018-07-06 3 M
11 2019-07-06 3 F
7 2017-07-16 4 F
14 2019-11-06 4 M
现在在ID 1中,索引3和5为F-F,在ID 2中,索引1和4在ID 3中为F-F,没有F-F模式,在ID 4中也没有F-F模式。
因此,预期的F-F子集如下所示。
Date ID Status
0 2017-07-06 1 F
1 2018-01-14 1 F
2 2017-05-06 2 F
3 2017-09-06 2 F
类似地,下面给出了子集之后的F-M数据帧
Date ID Status
0 2017-04-26 1 F
1 2017-05-16 1 M
2 2017-09-06 2 F
3 2018-07-06 2 M
4 2018-01-06 3 F
5 2018-07-06 3 M
6 2017-07-16 4 F
7 2019-11-06 4 M
答案 0 :(得分:1)
我很难理解您的问题,但是也许这些答案可以帮助您完全解决它,或者至少不会卡住(以防我错了问题)
我仍然看到三个问题:
以天为单位,计算下一次失败和下一次出现任何状态时的持续时间。
计算有两个以上失败的ID。
多少人之间至少有一次维护
因为您需要熊猫和小矮人
import pandas as pd
import numpy as np
sales = [{'ID': '1', 'Status': 'Failure', 'Date': '2017-04-26'},
{'ID': '2', 'Status': 'Failure', 'Date': '2017-05-06'},
{'ID': '1', 'Status': 'Maintenance', 'Date': '2017-05-16'},
{'ID': '1', 'Status': 'Failure', 'Date': '2017-07-06'},
{'ID': '2', 'Status': 'Failure', 'Date': '2017-09-06'},
{'ID': '1', 'Status': 'Failure', 'Date': '2018-01-14'},
{'ID': '3', 'Status': 'Maintenance', 'Date': '2017-07-16'},
{'ID': '4', 'Status': 'Failure', 'Date': '2017-07-16'},
{'ID': '2', 'Status': 'Maintenance', 'Date': '2018-07-06'},
{'ID': '3', 'Status': 'Failure', 'Date': '2018-01-06'},
{'ID': '3', 'Status': 'Maintenance', 'Date': '2018-07-06'},
{'ID': '3', 'Status': 'Failure', 'Date': '2019-07-06'},
{'ID': '2', 'Status': 'Maintenance', 'Date': '2019-05-06'},
{'ID': '2', 'Status': 'Failure', 'Date': '2019-10-06'},
{'ID': '4', 'Status': 'Maintenance', 'Date': '2019-11-06'}]
df = pd.DataFrame(sales)
df['Date'] = pd.to_datetime(df['Date'])
df = df.sort_values(['ID', 'Date', 'Status'])
print('{0}\n'.format(df))
# Question 2
# IDs with more than two failures
df_question2 = df.groupby(['ID', 'Status']) \
.size().reset_index() \
.rename(columns={'ID': 'ID', 'Status': 'Status', 0: 'Counts'})
# Answer 2
counts_of_more_than_two_failures = len(df_question2.loc[df_question2['Counts'] > 2])
print('IDs with more than two failures : {0}'.format(counts_of_more_than_two_failures))
# Question 3
# one maintenance between failures
df_question3 = df
df_question3['Status'] = np.where(df['Status'] == 'Failure', '1', '0')
df_question3_status = df_question3.groupby('ID')['Status'].apply(list)
dict_question3 = df_question3_status.to_frame().to_dict().get('Status')
# Answer 3
for key, value in dict_question3.items():
# keep only non-empty values from the list
_find_me = list(filter(None, ''.join(value).strip('0').split('1')))
_has = True if _find_me else False
print('ID {0} has number of maintenance between failures: {1}'.format(key, _has))
print('\n')
# subset patterns
df = pd.DataFrame(sales)
df['Date'] = pd.to_datetime(df['Date'])
df = df.sort_values(['ID', 'Date', 'Status'])
df_question3 = df
df_question3['Status'] = np.where(df['Status'] == 'Failure', '0', '1')
df_question3_patterns = df_question3.groupby('ID')['Status'].apply(list)
dict_question3 = df_question3_patterns.to_frame().to_dict().get('Status')
# F-F
# temp dataframe
df_ff_pattern = pd.DataFrame([])
for k, value in enumerate(dict_question3.items()):
# keep index in dictionary values
for i, j in enumerate(value[1]):
# only FF values
if i < len(value[1]) - 1 and j == '0':
if value[1][i] == value[1][i + 1]:
# locate n and n+1 rows based on i index
df_ff_pattern = df_ff_pattern.append(df_question3[df_question3['ID'] == value[0]].iloc[[i, i + 1]])
print('subset FF patterns')
# back-substitute status values
df_ff_pattern['Status'] = np.where(df_ff_pattern['Status'] == '0', 'F', 'M')
print(df_ff_pattern)
print('\n')
# F-M
for k, value in enumerate(dict_question3.items()):
# keep index in dictionary values
for i, j in enumerate(value[1]):
# only FM values
if i < len(value[1])-1 and j == '0':
if value[1][i] != value[1][i + 1]:
# locate n and n+1 rows based on i index
print('subset FM patterns')
print(df_question3[df_question3['ID'] == value[0]].iloc[[i, i+1]])
# Question 1
df_question1 = pd.DataFrame(sales)
df_question1['Date'] = pd.to_datetime(df_question1['Date'])
df_question1 = df_question1.reset_index().sort_values(['ID', 'Date', 'Status']).set_index(['ID', 'Status'])
df_question1['Difference'] = df_question1.groupby('ID')['Date'].transform(pd.Series.diff)
# Possible Answer 1
# all days in statuses
print(df_question1)
df_question1 = df_question1.reset_index()
df_question1_failure = df_question1.loc[df_question1['Status'] == 'Failure']
df_question1_failure_pre_diff = df_question1_failure[['ID', 'Difference']]
# filter by status
df_question1_maintenance = df_question1.loc[df_question1['Status'] == 'Maintenance']
df_question1_maintenance_pre_diff = df_question1_maintenance[['ID', 'Difference']]
# group by and sum
df_question1_failure_group = df_question1_failure_pre_diff.groupby('ID').sum()
df_question1_maintenance_group = df_question1_maintenance_pre_diff.groupby('ID').sum()
# Possible Answer 1
# days in status failure
print((df_question1_failure_group - df_question1_maintenance_group).abs())
如果您认为缺少某些内容,请发表评论,并改善答案。 无论如何,如果其他人把它们弄对了,这可能只是一个起点
希望它会有所帮助(: