使用loc命令

时间:2016-06-22 16:33:02

标签: python for-loop pandas optimization

我有一段python代码如下所示:

# Main Loop that take values attributed to the row by row basis and sorts
# them into correpsonding columns based on matching the 'Name' and the newly
# generated column names.
listed_names=list(df_cv) #list of column names to reference later.
variable=listed_names[3:] #List of the 3rd to the last column. Column 1&2 are irrelevant.
for i in df_cv.index: #For each index in the Dataframe (DF)
     for m in variable: #For each variable in the list of variable column names
            if df_cv.loc[i,'Name']==m: #If index location in variable name is equal to the variable column name...
                df_cv.loc[i,m]=df_cv.loc[i,'Value'] #...Then that location is equal to the value in same row under the column 'Value'

基本上它需要一个3xn的时间/名称/值列表,并按唯一(n)将其排序为大小为n的pandas df。

Time   Name    Value
1      Color   Red
2      Age     6
3      Temp    25
4      Age     1

进入这个:

Time   Color   Age    Temp
1      Red     
2              6
3                     25
4              1

我的代码需要花费很长的时间才能运行,我想知道是否有更好的方法来设置我的循环。我来自MATLAB背景,所以python的风格(即不使用所有的行/列仍然是外来的)。

如何让这段代码运行得更快?

1 个答案:

答案 0 :(得分:4)

而不是循环,将其视为枢轴操作。假设Time是一列而不是索引(如果是,只需使用reset_index):

In [96]: df
Out[96]: 
   Time   Name Value
0     1  Color   Red
1     2    Age     6
2     3   Temp    25
3     4    Age     1

In [97]: df.pivot(index="Time", columns="Name", values="Value")
Out[97]: 
Name   Age Color  Temp
Time                  
1     None   Red  None
2        6  None  None
3     None  None    25
4        1  None  None

In [98]: df.pivot(index="Time", columns="Name", values="Value").fillna("")
Out[98]: 
Name Age Color Temp
Time               
1          Red     
2      6           
3                25
4      1         

这在真实数据集上应该快得多,并且启动起来更简单。