我在csv文件中有以下数据格式:
1,01,ABC,This is abc101,This is another abc101
1,01,DEF,This is def101,This is another def101
1,02,ABC,This is abc102,This is another abc102
1,02,DEF,This is def102,This is another def102
1,02,GHI,This is ghi102,This is another ghi102
2,01,ABC,This is abc201,This is another abc201
2,01,DEF,This is def201,This is another def201
2,01,GHI,This is ghi201,This is another ghi201
2,03,GHI,This is ghi203,This is another ghi203
3,02,ABC,This is abc302,This is another abc302
3,02,ABC,This is abc302,This is another abc302
3,02,ABC,This is abc302,This is another abc302
4,01,ABC,This is abc401,This is another abc401
4,01,DEF,This is def401,This is another def401
4,01,ABC,This is abc401,This is another abc401
4,02,DEF,This is def402,This is another def402
4,02,DEF,This is def402,This is another def402
我也有一个变量列表= ['ABC','ABC_2','GHI','GHI_2']
csv文件头列表= ['ID1','ID2','Var_name','var_value1','var_value2']
我需要像以下格式一样调整上述数据
[['ID1','ID2','ABC','ABC_2','GHI','GHI_2'], [1,01,'This is abc101','This is another abc101','',''], [1,02,'This is abc102','This is another abc102','This is ghi102','This is another ghi102']]
..喜欢那个
如果变量列表= ['GHI','GHI_2','ABC','ABC_2']
输出将是:
[['ID1','ID2','GHI','GHI_2','ABC','ABC_2'], [1,01,'','','This is abc101','This is another abc101'], [1,02,'This is ghi102','This is another ghi102','This is abc102','This is another abc102']]
..喜欢那个
这意味着列表应该:
我想在Python 2.7中这样做,可能使用Pandas。
variable_list = ['ABC','DEF']
df = pd.read_csv(csvfile,delimiter='#',engine='python',header=None)
df.columns = ['ID1','ID2','var_name','var_value']
f=df.set_index(['ID1','ID2','var_name'])['var_value'].unstack(fill_value='').fillna('')[variable_list].reset_index()
L1 = [f.columns.tolist()] + f.values.tolist()
这段代码我试过单个var_value,现在我有两个(var_value1,var_value2)