Question

我正在尝试在整个项目期间按月线性分摊项目成本。

为此，我对具有定义功能的项目数据框进行迭代，该功能会将每行（或项目）转换为成本明细表的新数据框。

然后，我希望将函数的返回数据框合并在一起，以创建最终数据集，作为初始数据框中所有项目的帐单明细表。

这是我定义的功能：

    def amortizeProject(name, start, end, cost):
      """ Create an amortized cost schedule by month for a given project where:
        name = project Name
        start = start date of project
        end = end date of project
        cost = total cost of project
      """
      # Create an index of the payment dates
      rng = pd.date_range(start, end, freq='MS')
      rng.name = "Cost_Date"

      # Build up the Amortization schedule as a DataFrame
      df = pd.DataFrame(index=rng,columns=['Name','Period_Cost'], dtype='float')

      # Add index by period
      df.reset_index(inplace=True)
      df.index += 1
      df.index.name = "Period"
      df["Name"] = name
      df["Period_Cost"] = np.pmt(0, rng.size, cost)

      # Return the new dataframe
      df = df.round(2)
      return df

我正在尝试遍历initial_dataframe，即：

            Name       Start         End     Cost
    0  Project 1  2019-07-01  2020-07-01  1000000
    1  Project 2  2020-01-01  2021-03-31   350000

使用如下功能：

    new_dataframe = initial_dataframe.apply(lambda x: amortizeProject(x['Name'], x['Start'], x['End'], x['Cost']), axis=1)

理想情况下，new_dataframe将是所有结果迭代的串联，但是我不确定采用正确的方式来格式化.apply函数的输出来执行此操作。我确实知道该函数会在一次迭代中产生预期的结果。

此外，我对Pandas来说还很陌生，所以如果有更好/更优化的方法可以做到这一点，我很想听听。

Answer 1

我认为最干净的选项可能是apply和stack的组合。因此，请沿行使用.apply返回pd.Series（其中索引是时间表中的每个日期，并且这些值是摊销后的值），然后使用.stack将这些值折叠到正确的位置，例如

def amortize(sers):
    values = #get the values
    dates = #get the dates
    return pd.Series(values, index=dates)

new_df = initial_dataframe.apply(amortize, axis=1).stack()

Answer 2

代替格式化.apply()，我认为您可以通过以下方式实现：

初始化一个空列表以存储所有df df_list = []。在迭代过程中将其填充到函数df_list.append(df)中。迭代后，将存储在该列表中的所有df连接到df df = pd.concat(df_list)。

因此您发布的代码应为：

def amortizeProject(name, start, end, cost):
  """ Create an amortized cost schedule by month for a given project where:
    name = project Name
    start = start date of project
    end = end date of project
    cost = total cost of project
  """
  # Create an index of the payment dates
  rng = pd.date_range(start, end, freq='MS')
  rng.name = "Cost_Date"

  # Build up the Amortization schedule as a DataFrame
  df = pd.DataFrame(index=rng,columns=['Name','Period_Cost'], dtype='float')

  # Add index by period
  df.reset_index(inplace=True)
  df.index += 1
  df.index.name = "Period"
  df["Name"] = name
  df["Period_Cost"] = np.pmt(0, rng.size, cost)

  # Return the new dataframe
  df = df.round(2)
  df_list.append(df)
  return df_list


df_list = []
new_dataframe = initial_dataframe.apply(lambda x: amortizeProject(x['Name'], x['Start'], x['End'], x['Cost']), axis=1)
df = pd.concat(df_list)
print(df)

输出应如下所示：

enter image description here

Answer 3

我结束了使用全局数据框满足我的需求的以下解决方案：

globalDF = pd.DataFrame(columns=['Cost_Date','Name','Period_Cost'])

然后在函数迭代期间，我使用concat函数来构建全局函数：

globalDF = pd.concat([globalDF,df])

这与提供的列表附加方法非常相似。

熊猫-将应用功能的结果数据框合并到新数据框

3 个答案: