我在csv文件下面,
itemid testresult duplicateid
100 textboxerror 0
101 text_input_issue 100
102 menuitemerror 0
103 text_click_issue 100
104 text_caps_error 100
105 menu_drop_down_error 102
106 text_lower_error 100
107 menu_item_null 102
我想根据重复的id将上面的表testreslts转换为两列,结果列为相似的testresults,示例表必须如下所示,
必需的数据框:
index testresult similartestresults duplicateid
1 textboxerror text_click_issue 100
2 textboxerror text_caps_error 100
3 textboxerror text_caps_error 100
4 textboxerror text_lower_error 100
5 menuitemerror menu_drop_down_error 102
6 menuitemerror menu_item_null 102
我尝试使用pandas groupby,但是它只给出单个列表,代码如下,
df1 = df.groupby(["duplicateid", "testresult"])
print (df1)
print (df1.groups)
df['similartestresults'] = df.groupby("duplicateid")['testresult'].apply(lambda tags: ','.join(tags))
print (df2)
但是以上两种方法均未获得理想的结果。请对此提出建议。 谢谢, TSJ
答案 0 :(得分:0)
复制测试结果列,并使用前四个字符作为组名进行更新。将其替换为最终的组名。然后删除不必要的列并重新排序。这符合您问题的意图吗?
df['simlartestresult'] = df['testresult'].copy()
# Update to group_name
df['testresult'] = df['simlartestresult'].apply(lambda x: x[:4])
df['testresult'].replace(['text','menu'],['textboxerror','menuitemerror'],inplace=True)
# delete 'dupulicateid = 0'
df = df[~(df['duplicateid'] == 0)]
df = df.sort_values('duplicateid', ascending=True)
df
itemid testresult duplicateid simlartestresult
1 101 textboxerror 100 text_input_issue
3 103 textboxerror 100 text_click_issue
4 104 textboxerror 100 text_caps_error
6 106 textboxerror 100 text_lower_error
5 105 menuitemerror 102 menu_drop_down_error
7 107 menuitemerror 102 menu_item_null