根据字典的附加列表在df中创建新列,并遍历字典熊猫列表

时间:2020-07-19 07:49:26

标签: python-3.x pandas list dataframe

我有一个df和字典列表,如下所示。

df:

Date                t_factor     
2020-02-01             5             
2020-02-02             23              
2020-02-03             14           
2020-02-04             23
2020-02-05             23  
2020-02-06             23          
2020-02-07             30            
2020-02-08             29            
2020-02-09             100
2020-02-10             38
2020-02-11             38               
2020-02-12             38                    
2020-02-13             70           
2020-02-14             70 

REQUEST_OBJ = {
    "blue": {
        "best": [
 {'type': 'quadratic',
  'from': '2020-02-03T20:00:00.000Z',
  'to': '2020-02-06T20:00:00.000Z',
  'days': 3,
  'coef': [0.1, 0.1, 0.1, 0.1, 0.1, 0.1]},
 {'type': 'linear',
  'from': '2020-02-06T20:00:00.000Z',
  'to': '2020-02-10T20:00:00.000Z',
  'days': 3,
  'coef': [0.1, 0.1, 0.1, 0.1, 0.1, 0.1]},
 {'type': 'polynomial',
  'from': '2020-02-10T20:00:00.000Z',
  'to': '2020-02-14T20:00:00.000Z',
  'days': 3,
  'coef': [0.1, 0.1, 0.1, 0.1, 0.1, 0.1]}]
    }
}

第一步: 因此,我想更改字典中的“最佳”列表,如下所示。

Step1.1: Sort the list based on the value of "from" key in dictionary

[
 {"type": "quadratic",
      "from": "2020-02-03T20:00:00.000Z",
      "to": "2020-02-10T20:00:00.000Z",
      "days":3,
      "coef":[0.1,0.1,0.1,0.1,0.1,0.1]
      },
{"type": "linear",
      "from": "2020-02-04T20:00:00.000Z",
      "to": "2020-02-03T20:00:00.000Z",
      "days":3,
      "coef":[0.1,0.1,0.1,0.1,0.1,0.1]
      },
     {"type": "polynomial",
      "from": "2020-02-05T20:00:00.000Z",
      "to": "2020-02-03T20:00:00.000Z",
      "days":3,
      "coef":[0.1,0.1,0.1,0.1,0.1,0.1]
      }]
Step1.2:add a dictionary with value of "from" key as minimum date of df and "to" should be "from" date the first dictionary in the sorted list. "days" = 0, "coef":[0.1,0.1,0.1,0.1,0.1,0.1].

{"type": "df_first",
      "from": "2020-02-01T20:00:00.000Z",
      "to": "2020-02-03T20:00:00.000Z",
      "days":0,
      "coef":[0.1,0.1,0.1,0.1,0.1,0.1]
      }
Step1.3:add a dictionary with value of "from" key as 7 days after minimum date of df and "to" should be one days after from

{"type": "df_mid",
      "from": "2020-02-08T20:00:00.000Z",
      "to": "2020-02-09T20:00:00.000Z",
      "days":0,
      "coef":[0.1,0.1,0.1,0.1,0.1,0.1]
      }
Step1.4:add a dictionary with value of "from" key as maximum date of df and "to" should be same as well as "from".

{"type": "df_last",
      "from": "2020-02-14T20:00:00.000Z",
      "to": "2020-02-14T20:00:00.000Z",
      "days":0,
      "coef":[0.1,0.1,0.1,0.1,0.1,0.1]
      }
Step 1.5: Sort all the dictionary based on "from" date.

Expected Output:

[{"type": "df_first",
      "from": "2020-02-01T20:00:00.000Z",
      "to": "2020-02-03T20:00:00.000Z",
      "days":0,
      "coef":[0.1,0.1,0.1,0.1,0.1,0.1]
      },
     {"type": "quadratic",
      "from": "2020-02-03T20:00:00.000Z",
      "to": "2020-02-10T20:00:00.000Z",
      "days":3,
      "coef":[0.1,0.1,0.1,0.1,0.1,0.1]
      },
{"type": "linear",
      "from": "2020-02-04T20:00:00.000Z",
      "to": "2020-02-03T20:00:00.000Z",
      "days":3,
      "coef":[0.1,0.1,0.1,0.1,0.1,0.1]
      },

     {"type": "polynomial",
      "from": "2020-02-05T20:00:00.000Z",
      "to": "2020-02-03T20:00:00.000Z",
      "days":3,
      "coef":[0.1,0.1,0.1,0.1,0.1,0.1]
      },
{"type": "df_mid",
      "from": "2020-02-08T20:00:00.000Z",
      "to": "2020-02-09T20:00:00.000Z",
      "days":0,
      "coef":[0.1,0.1,0.1,0.1,0.1,0.1]
      },

{"type": "df_last",
      "from": "2020-02-14T20:00:00.000Z",
      "to": "2020-02-14T20:00:00.000Z",
      "days":0,
      "coef":[0.1,0.1,0.1,0.1,0.1,0.1]
      }
]

Step 1.6:

Replace the "to" value of each dictionary with "from" value of next dictionary. "to" value of last dictionary be as it is.

Expected  output:

    [{"type": "df_first",
              "from": "2020-02-01T20:00:00.000Z",
              "to": "2020-02-03T20:00:00.000Z",
              "days":0,
              "coef":[0.1,0.1,0.1,0.1,0.1,0.1]
              },
             {"type": "quadratic",
              "from": "2020-02-03T20:00:00.000Z",
              "to": "2020-02-04T20:00:00.000Z",
              "days":3,
              "coef":[0.1,0.1,0.1,0.1,0.1,0.1]
              },
        {"type": "linear",
              "from": "2020-02-04T20:00:00.000Z",
              "to": "2020-02-05T20:00:00.000Z",
              "days":3,
              "coef":[0.1,0.1,0.1,0.1,0.1,0.1]
              },
        
             {"type": "polynomial",
              "from": "2020-02-05T20:00:00.000Z",
              "to": "2020-02-08T20:00:00.000Z",
              "days":3,
              "coef":[0.1,0.1,0.1,0.1,0.1,0.1]
              },
        {"type": "df_mid",
              "from": "2020-02-08T20:00:00.000Z",
              "to": "2020-02-14T20:00:00.000Z",
              "days":0,
              "coef":[0.1,0.1,0.1,0.1,0.1,0.1]
              },
        
        {"type": "df_last",
              "from": "2020-02-14T20:00:00.000Z",
              "to": "2020-02-14T20:00:00.000Z",
              "days":0,
              "coef":[0.1,0.1,0.1,0.1,0.1,0.1]
              }
        ]

基于更新的字典在df中创建一个新列 我想根据字典指定的“类型”和日期列在df中创建一个新列。

Explanation:

if "type" == df_first:
    df['new_col'] = df['t_factor'] (duration only from the "from" and "to" date specified in that dictionary)


if "type" == df_mid:
    df['new_col'] = df['t_factor'] (duration only from the "from" and "to" date specified in that dictionary)


elif "type" == "quadratic":
     df['new_col'] = a0 + a1*(T) + a2*(T)**2 + previous value of df['new_col']
     where T = 1 for one day after the "from" date of that dictionary and T counted in days based Date value

elif "type" == "linear":
     df['new_col'] = a0 + a1*(T) + previous value of df['new_col']
     where T = 1 for one day after the "from" date of that dictionary.

elif "type" == "polynomial":
     df['new_col'] = a0 + a1*(T) + a2*(T)**2  + a3*(T)**3  + a4*(T)**4  + a5*(T)**5 + previous value of df['new_col']
     where T = 1 for start_date of that dictionary.

if "type" == df_last:
    df['new_col'] = df['t_factor'] (duration only from the "from" and "to" date specified in that dictionary)

我尝试了以下代码:

    df = pd.read_csv(StringIO("""Date                t_factor     
2020-02-01             5             
2020-02-02             23              
2020-02-03             14           
2020-02-04             23
2020-02-05             23  
2020-02-06             23          
2020-02-07             30            
2020-02-08             29            
2020-02-09             100
2020-02-10             38
2020-02-11             38               
2020-02-12             38                    
2020-02-13             70           
2020-02-14             70"""), sep="\s+", parse_dates=[0])

REQUEST_OBJ = {
    "blue": {
        "best": [
 {'type': 'quadratic',
  'from': '2020-02-03T20:00:00.000Z',
  'to': '2020-02-06T20:00:00.000Z',
  'days': 3,
  'coef': [0.1, 0.1, 0.1, 0.1, 0.1, 0.1]},
 {'type': 'linear',
  'from': '2020-02-06T20:00:00.000Z',
  'to': '2020-02-10T20:00:00.000Z',
  'days': 3,
  'coef': [0.1, 0.1, 0.1, 0.1, 0.1, 0.1]},
 {'type': 'polynomial',
  'from': '2020-02-10T20:00:00.000Z',
  'to': '2020-02-14T20:00:00.000Z',
  'days': 3,
  'coef': [0.1, 0.1, 0.1, 0.1, 0.1, 0.1]}]
    }
}

def add_dct(lst, _type, _from, _to):
    lst.append({
        'type': _type,
        'from': _from if isinstance(_from, str) else _from.strftime("%Y-%m-%dT20:%M:%S.000Z"),
        'to': _to if isinstance(_to, str) else _to.strftime("%Y-%m-%dT20:%M:%S.000Z"),
        'days': 0,
        "coef":[0.1,0.1,0.1,0.1,0.1,0.1]
    })

def fn_graph(df, REQUEST_OBJ):
    

    REQUIRED_KEYS = ["blue"]

    for bluewhite_category in REQUIRED_KEYS:
        print(bluewhite_category)
        if bluewhite_category in REQUEST_OBJ.keys():
            for bestworst_category in REQUEST_OBJ[bluewhite_category].keys():
                print(bestworst_category)
                param_obj_list = REQUEST_OBJ[bluewhite_category][bestworst_category]
                dmin, dmax = df['Date'].min(), df['Date'].max()
                #sort input list based on d['from']
                param_obj_list = sorted(param_obj_list, key=lambda d: pd.Timestamp(d['from']))
                # add a dictionary with d['from'] = dmin
                param_obj_list = add_dct(param_obj_list, 'df_first', dmin, param_obj_list[0]['from'])
                # add a dictionary with d['from'] as data_end
                param_obj_list = add_dct(param_obj_list, 'df_mid', dmin + pd.Timedelta(days=7), dmin + pd.Timedelta(days=8))
                # add dictionary with d['from'] as projection end
                param_obj_list = add_dct(param_obj_list, 'df_last', dmax, dmax)
                # sort the final list of dictionary based on d['from']
                param_obj_list = sorted(param_obj_list, key=lambda d: pd.Timestamp(d['from']))
                # Replace the 'to' date as from of previous dictionary
                df1ist = pd.DataFrame(param_obj_list)
                df1ist['to'] = df1ist['from'].shift(-1).fillna(df1ist['to'])
                param_obj_list = df1ist.to_dict('r')
                print(param_obj_list)
                kind = bluewhite_category + '_' + bestworst_category
                df['time_function'] = np.nan
                for d in param_obj_list:
                    a0, a1, a2, a3, a4, a5 = d['coef']

                    start = pd.Timestamp(d['from']).strftime('%Y-%m-%d')
                    end = pd.Timestamp(d['to']).strftime('%Y-%m-%d')

                    T = df['Date'].sub(pd.Timestamp(start)).dt.days
                    mask = df['Date'].between(start, end, inclusive=True)

                    if d['type'] == 'df_first':
                        df.loc[mask, 'time_function'] = df['t_factor']
                        
                    elif d['type'] == 'quadratic':
                        df.loc[mask, 'time_function'] = a0 + a1 * T + a2 * (T)**2 + df['new_col'].ffill()
        
                    elif d['type'] == 'linear':
                        df.loc[mask, 'time_function'] = a0 + a1 * T + df['new_col'].ffill()
        
                    elif d['type'] == 'polynomial':
                        df.loc[mask, 'time_function'] = a0 + a1*(T) + a2*(T)**2 + a3 * \
                                    (T)**3 + a4*(T)**4 + a5*(T)**5 + df['new_col'].ffill()
                    
                    elif d['type'] == 'df_mid':
                        df.loc[mask, 'time_function'] = df['t_factor']
                        
                    elif d['type'] == 'df_mid':
                        df.loc[mask, 'time_function'] = df['t_factor']
                        
                    elif d['type'] == 'df_last':
                        df.loc[mask, 'time_function'] = df['t_factor']
                        
                
        else:
            return df
    return df

fn_graph(df, REQUEST_OBJ)

而且我遇到了错误。

AttributeError: 'NoneType' object has no attribute 'append'

1 个答案:

答案 0 :(得分:0)

这是我纠正该错误的方式

刚刚更改

param_obj_list = add_dct(param_obj_list, 'df_first', dmin, param_obj_list[0]['from'])

add_dct(param_obj_list, 'df_first', dmin, param_obj_list[0]['from'])

以下是完整的代码:

df = pd.read_csv(StringIO("""Date                t_factor     
2020-02-01             5             
2020-02-02             23              
2020-02-03             14           
2020-02-04             23
2020-02-05             23  
2020-02-06             23          
2020-02-07             30            
2020-02-08             29            
2020-02-09             100
2020-02-10             38
2020-02-11             38               
2020-02-12             38                    
2020-02-13             70           
2020-02-14             70"""), sep="\s+", parse_dates=[0])

REQUEST_OBJ = {
    "blue": {
        "best": [
 {'type': 'quadratic',
  'from': '2020-02-03T20:00:00.000Z',
  'to': '2020-02-06T20:00:00.000Z',
  'days': 3,
  'coef': [0.1, 0.1, 0.1, 0.1, 0.1, 0.1]},
 {'type': 'linear',
  'from': '2020-02-06T20:00:00.000Z',
  'to': '2020-02-10T20:00:00.000Z',
  'days': 3,
  'coef': [0.1, 0.1, 0.1, 0.1, 0.1, 0.1]},
 {'type': 'polynomial',
  'from': '2020-02-10T20:00:00.000Z',
  'to': '2020-02-14T20:00:00.000Z',
  'days': 3,
  'coef': [0.1, 0.1, 0.1, 0.1, 0.1, 0.1]}]
    }
}

def add_dct(lst, _type, _from, _to):
    lst.append({
        'type': _type,
        'from': _from if isinstance(_from, str) else _from.strftime("%Y-%m-%dT20:%M:%S.000Z"),
        'to': _to if isinstance(_to, str) else _to.strftime("%Y-%m-%dT20:%M:%S.000Z"),
        'days': 0,
        "coef":[0.1,0.1,0.1,0.1,0.1,0.1]
    })

def fn_graph(df, REQUEST_OBJ):


    REQUIRED_KEYS = ["blue"]

    for bluewhite_category in REQUIRED_KEYS:
        print(bluewhite_category)
        if bluewhite_category in REQUEST_OBJ.keys():
            for bestworst_category in REQUEST_OBJ[bluewhite_category].keys():
                print(bestworst_category)
                param_obj_list = REQUEST_OBJ[bluewhite_category][bestworst_category]
                dmin, dmax = df['Date'].min(), df['Date'].max()
                #sort input list based on d['from']
                param_obj_list = sorted(param_obj_list, key=lambda d: pd.Timestamp(d['from']))
                # add a dictionary with d['from'] = dmin
                add_dct(param_obj_list, 'df_first', dmin, param_obj_list[0]['from'])
                # add a dictionary with d['from'] as data_end
                add_dct(param_obj_list, 'df_mid', dmin + pd.Timedelta(days=7), dmin + pd.Timedelta(days=8))
                # add dictionary with d['from'] as projection end
                add_dct(param_obj_list, 'df_last', dmax, dmax)
                # sort the final list of dictionary based on d['from']
                param_obj_list = sorted(param_obj_list, key=lambda d: pd.Timestamp(d['from']))
                # Replace the 'to' date as from of previous dictionary
                df1ist = pd.DataFrame(param_obj_list)
                df1ist['to'] = df1ist['from'].shift(-1).fillna(df1ist['to'])
                param_obj_list = df1ist.to_dict('r')
                print(param_obj_list)
                kind = bluewhite_category + '_' + bestworst_category
                df['time_function'] = np.nan
                for d in param_obj_list:
                    a0, a1, a2, a3, a4, a5 = d['coef']

                    start = pd.Timestamp(d['from']).strftime('%Y-%m-%d')
                    end = pd.Timestamp(d['to']).strftime('%Y-%m-%d')

                    T = df['Date'].sub(pd.Timestamp(start)).dt.days
                    mask = df['Date'].between(start, end, inclusive=True)

                    if d['type'] == 'df_first':
                        df.loc[mask, 'time_function'] = df['t_factor']
                        
                    elif d['type'] == 'quadratic':
                        df.loc[mask, 'time_function'] = a0 + a1 * T + a2 * (T)**2 + df['new_col'].ffill()
        
                    elif d['type'] == 'linear':
                        df.loc[mask, 'time_function'] = a0 + a1 * T + df['new_col'].ffill()
        
                    elif d['type'] == 'polynomial':
                        df.loc[mask, 'time_function'] = a0 + a1*(T) + a2*(T)**2 + a3 * \
                                    (T)**3 + a4*(T)**4 + a5*(T)**5 + df['new_col'].ffill()
                    
                    elif d['type'] == 'df_mid':
                        df.loc[mask, 'time_function'] = df['t_factor']
                        
                    elif d['type'] == 'df_mid':
                        df.loc[mask, 'time_function'] = df['t_factor']
                        
                    elif d['type'] == 'df_last':
                        df.loc[mask, 'time_function'] = df['t_factor']
                        
                
        else:
            return df
    return df

fn_graph(df, REQUEST_OBJ)