我的数据如下:
Application WorkflowStep
0 WF:ACAA-CR (auto) Manager
1 WF:ACAA-CR (auto) Access Responsible
2 WF:ACAA-CR (auto) Automatic
3 WF:ACAA-CR-AccResp (auto) Manager
4 WF:ACAA-CR-AccResp (auto) Access Responsible
5 WF:ACAA-CR-AccResp (auto) Automatic
6 WF:ACAA-CR-IT-AccResp[AUTO] Group
7 WF:ACAA-CR-IT-AccResp[AUTO] Access Responsible
8 WF:ACAA-CR-IT-AccResp[AUTO] Automatic
除了这两列外,我还要添加第三列,以显示所有WorkflowStep
的总和。
字典应类似于以下内容(或类似名称):
{'WF:ACAA-CR (auto)':
[{'Workflow': ['Manager', 'Access Responsible','Automatic'], 'Summary': 3}],
'WF:ACAA-CR-AccResp (auto)':
[{'Workflow': ['Manager','Access Responsible','Automatic'], 'Summary': 3}],
'WF:ACAA-CR-IT-AccResp[AUTO]':
[{'Workflow': ['Group','Access Responsible','Automatic'], 'Summary': 3}]
}
我的上面两列中创建字典的代码很好用。
for i in range(len(df)):
currentid = df.iloc[i,0]
currentvalue = df.iloc[i,1]
dict.setdefault(currentid, [])
dict[currentid].append(currentvalue)
创建WorkflowStep
之和的代码如下,并且工作正常:
for key, values in dict.items():
val = values
match = ["Manager", "Access Responsible", "Automatic", "Group"]
c = Counter(val)
sumofvalues = 0
for m in match:
if c[m] == 1:
sumofvalues += 1
我的初始想法是调整我的第一个代码,其中初始键为Application
和WorkflowStep
,Summary
为子词典。
for i in range(len(df)):
currentid = df.iloc[i,0]
currentvalue = df.iloc[i,1]
dict.setdefault(currentid, [])
dict[currentid].append({"Workflow": [currentvalue], "Summary": []})
但是,这样做的结果并不令人满意,因为它没有将currentvalue
添加到已经存在的Workflow
键中,而是在每次迭代后重新创建它们。
示例
{'WF:ACAA-CR (auto)': [{'Workflow': ['Manager'], 'Summary': []},
{'Workflow': ['Access Responsible'], 'Summary': []},
{'Workflow': ['Automatic'], 'Summary': []}]
}
如何创建类似于我上面写的字典?
答案 0 :(得分:4)
IIUC,这可以帮助您-
val = df.groupby('Application')['WorkflowStep'].unique()
{val.index[i]: [{'WorkflowStep':list(val[i]), 'Summary':len(val[i])}] for i in range(len(val))}
导致
{'WF:ACAA-CR (auto)': [{'WorkflowStep': ['Manager', 'Access Responsible', 'Automatic'], 'Summary': 3}],
'WF:ACAA-CR-AccResp (auto)': [{'WorkflowStep': ['Manager', 'Access Responsible', 'Automatic'], 'Summary': 3}],
'WF:ACAA-CR-IT-AccResp[AUTO]': [{'WorkflowStep': ['Group', 'Access Responsible', 'Automatic'], 'Summary': 3}]}
答案 1 :(得分:0)
我认为我的答案是一种更好的做事方式,它利用了数据框的强大功能,但作为参考,如果您想以尝试的方式来做,我想这会起作用:
# Create the data for testing.
d = {'Application': ["WF:ACAA-CR (auto)", "WF:ACAA-CR (auto)", "WF:ACAA-CR (auto)",
"WF:ACAA-CR-AccResp (auto)", "WF:ACAA-CR-AccResp (auto)", "WF:ACAA-CR-AccResp (auto)"],
'WorkflowStep': ["Manager", "Access Responsible","Automatic","Manager","Access Responsible", "Automatic"]}
df = pd.DataFrame(d)
new_dict = dict()
# Iterate through the rows of the data frame.
for index, row in df.iterrows():
# Get the values for the current row.
current_application_id = row['Application']
current_workflowstep = row['WorkflowStep']
# Set the default values if not already set.
new_dict.setdefault(current_application_id, {'Workflow': [], 'Summary' : 0})
# Add the new values.
new_dict[current_application_id]['Workflow'].append(current_workflowstep)
new_dict[current_application_id]['Summary'] += 1
print(new_dict)
哪个输出为:
{'WF:ACAA-CR (auto)': {'Workflow': ['Manager', 'Access Responsible', 'Automatic'], 'Summary': 3},
'WF:ACAA-CR-AccResp (auto)': {'Workflow': ['Manager', 'Access Responsible', 'Automatic'], 'Summary': 3}}