Question

我在csv文件中有以下数据格式：

1,01,ABC,This is abc101,This is another abc101
1,01,DEF,This is def101,This is another def101
1,02,ABC,This is abc102,This is another abc102
1,02,DEF,This is def102,This is another def102
1,02,GHI,This is ghi102,This is another ghi102
2,01,ABC,This is abc201,This is another abc201
2,01,DEF,This is def201,This is another def201
2,01,GHI,This is ghi201,This is another ghi201
2,03,GHI,This is ghi203,This is another ghi203
3,02,ABC,This is abc302,This is another abc302
3,02,ABC,This is abc302,This is another abc302
3,02,ABC,This is abc302,This is another abc302
4,01,ABC,This is abc401,This is another abc401
4,01,DEF,This is def401,This is another def401
4,01,ABC,This is abc401,This is another abc401
4,02,DEF,This is def402,This is another def402
4,02,DEF,This is def402,This is another def402

我也有一个变量列表= ['ABC','ABC_2','GHI','GHI_2'] csv文件头列表= ['ID1','ID2','Var_name','var_value1','var_value2']

我需要像以下格式一样调整上述数据 [['ID1','ID2','ABC','ABC_2','GHI','GHI_2'], [1,01,'This is abc101','This is another abc101','',''], [1,02,'This is abc102','This is another abc102','This is ghi102','This is another ghi102']] ..喜欢那个

如果变量列表= ['GHI','GHI_2','ABC','ABC_2'] 输出将是： [['ID1','ID2','GHI','GHI_2','ABC','ABC_2'], [1,01,'','','This is abc101','This is another abc101'], [1,02,'This is ghi102','This is another ghi102','This is abc102','This is another abc102']] ..喜欢那个

这意味着列表应该：

填充所有ID的数据
为上述数据集中不存在的变量创建空字符串。
csv文件没有标题，我们有一个单独的标题列表
填充嵌套列表，保持与标题列表相同的顺序
仅填充那些标题值，即标题列表只有值'ABC'，'GHI'，因此嵌套列表应该只填充值'ABC'和'GHI'，并且应忽略上述数据集中的'DEF'行
对于var_value2，它将填充在_2位置，如'这是另一个abc101'将在'ABC_2'下

我想在Python 2.7中这样做，可能使用Pandas。

variable_list = ['ABC','DEF']
df = pd.read_csv(csvfile,delimiter='#',engine='python',header=None)
df.columns = ['ID1','ID2','var_name','var_value']
f=df.set_index(['ID1','ID2','var_name'])['var_value'].unstack(fill_value='').fillna('')[variable_list].reset_index()
L1 = [f.columns.tolist()] + f.values.tolist()

这段代码我试过单个var_value，现在我有两个（var_value1，var_value2）

在python中的数据透视csv文件

0 个答案: