我有一段python代码如下所示:
# Main Loop that take values attributed to the row by row basis and sorts
# them into correpsonding columns based on matching the 'Name' and the newly
# generated column names.
listed_names=list(df_cv) #list of column names to reference later.
variable=listed_names[3:] #List of the 3rd to the last column. Column 1&2 are irrelevant.
for i in df_cv.index: #For each index in the Dataframe (DF)
for m in variable: #For each variable in the list of variable column names
if df_cv.loc[i,'Name']==m: #If index location in variable name is equal to the variable column name...
df_cv.loc[i,m]=df_cv.loc[i,'Value'] #...Then that location is equal to the value in same row under the column 'Value'
基本上它需要一个3xn的时间/名称/值列表,并按唯一(n)将其排序为大小为n的pandas df。
Time Name Value
1 Color Red
2 Age 6
3 Temp 25
4 Age 1
进入这个:
Time Color Age Temp
1 Red
2 6
3 25
4 1
我的代码需要花费很长的时间才能运行,我想知道是否有更好的方法来设置我的循环。我来自MATLAB背景,所以python的风格(即不使用所有的行/列仍然是外来的)。
如何让这段代码运行得更快?
答案 0 :(得分:4)
而不是循环,将其视为枢轴操作。假设Time是一列而不是索引(如果是,只需使用reset_index
):
In [96]: df
Out[96]:
Time Name Value
0 1 Color Red
1 2 Age 6
2 3 Temp 25
3 4 Age 1
In [97]: df.pivot(index="Time", columns="Name", values="Value")
Out[97]:
Name Age Color Temp
Time
1 None Red None
2 6 None None
3 None None 25
4 1 None None
In [98]: df.pivot(index="Time", columns="Name", values="Value").fillna("")
Out[98]:
Name Age Color Temp
Time
1 Red
2 6
3 25
4 1
这在真实数据集上应该快得多,并且启动起来更简单。