Question

我有一些代码（大多数不是我的原始代码）在本地PC Anaconda Jupyter Notebook环境中运行。我需要扩大处理范围，因此我正在研究Azure Databricks。其中有一段代码正在运行Python循环，但使用R库（统计信息），然后将数据通过R模型（tbats）传递。因此，一个Jupyter Notebook单元运行python和R代码。也可以在Azure Databricks笔记本中完成此操作吗？我只找到了可以使您在单元格之间更改语言的文档。

在上一个单元格中，我有：

%r libarary(stats)

因此，导入了库统计信息（以及其他R库）。但是，当我运行下面的代码时，我得到

NameError：名称“ stats”未定义

我想知道这是否是Databricks想要您告诉单元格您正在使用的语言（例如％r，％python等）的方式。

我的Python代码：

for customerid, dataForCustomer in original.groupby(by=['customer_id']):
    startYear = dataForCustomer.head(1).iloc[0].yr
    startMonth = dataForCustomer.head(1).iloc[0].mnth
    endYear = dataForCustomer.tail(1).iloc[0].yr
    endMonth = dataForCustomer.tail(1).iloc[0].mnth

    #Creating a time series object
    customerTS = stats.ts(dataForCustomer.usage.astype(int),
                      start=base.c(startYear,startMonth),
                      end=base.c(endYear, endMonth), 
                      frequency=12)
    r.assign('customerTS', customerTS)

    ##Here comes the R code piece
    try:
        seasonal = r('''
                    fit<-tbats(customerTS, seasonal.periods = 12, 
                                    use.parallel = TRUE)
                    fit$seasonal
                 ''')
    except: 
        seasonal = 1

    # APPEND DICTIONARY TO LIST (NOT DATA FRAME)
    df_list.append({'customer_id': customerid, 'seasonal': seasonal})
    print(f' {customerid} | {seasonal} ')

seasonal_output = pa.DataFrame(df_list)

在一个单元格中的Azure Databricks Jupyter Notebook Python和R

0 个答案: