Question

我有一个函数，用于根据传递给函数的字段名称创建和命名新数据框。

假设数据框df包含字段"date"，"sales"和"orders"。运行该函数后，我希望能够将数据框名称设置为sales_trend，这将是trend(df, "sales")的结果。

def trend(df, field_name):
    df_name = df.groupby('date')[field_name].mean().reset_index().sort_values(by='date', ascending=True)
    return (field_name + '_trend') = df_name

我显然没有做到这一点。任何建议都会非常感激。

Answer 1

在一般函数中，不返回名称，而是给出一个对象。您可以参考以下帖子。

How to write a function to return the variable name in Python

http://effbot.org/pyfaq/how-can-my-code-discover-the-name-of-an-object.htm

我相信您正在尝试实施以下代码

def trend(df,field_name):
     df_name = df.groupby('date')[field_name].mean().reset_index().sort_values(by='date', ascending=True)
     return (df_name)  


mydic = {}
field_name='Sample'

mydic[field_name+'Trend'] = trend(df,field_name)
print mydic['SampleTrend']

Answer 2

可以通过修改globals()动态地将名称添加到全局命名空间，但不鼓励强烈。改为使用字典（如Shijo所述）。

另一种方法是在同一GroupBy对象上聚合所有列。例如，给出以下数据框

np.random.seed(0)

# generate fake data
date_range = pd.Series(pd.date_range('2017-01-01', periods=3))
df = pd.DataFrame({
    'date': pd.concat([date_range] * 2),
    'sales': np.random.normal(0, 1, 6),
    'orders': np.random.normal(0, 1, 6)
}).reset_index(drop=True)
print(df)

        date    orders     sales
0 2017-01-01  0.950088  1.764052
1 2017-01-02 -0.151357  0.400157
2 2017-01-03 -0.103219  0.978738
3 2017-01-01  0.410599  2.240893
4 2017-01-02  0.144044  1.867558
5 2017-01-03  1.454274 -0.977278

你可以做到

# the fields for which you want to compute trends
field_names = ['sales', 'orders']

# compute trends using a single GroupBy
trend = df.groupby('date', as_index=False)[field_names].mean().sort_values('date')
print(trend)

        date     sales    orders
0 2017-01-01  2.002473  0.680343
1 2017-01-02  1.133858 -0.003657
2 2017-01-03  0.000730  0.675527

现在，您可以使用与命名空间类似的结果trend数据框。如果您想使用名称sales_trend，则可以使用trend['sales']。

print(trend['sales'])

0    2.002473
1    1.133858
2    0.000730
Name: sales, dtype: float64

基于两个字符串创建和命名数据框

2 个答案: