调用TabPy SCRIPT_REAL时出现Tableau错误“所有字段必须是集合或常数”

时间:2019-05-31 19:20:39

标签: python python-2.7 tableau hypothesis-test tabpy

我正在通过Tableau工作表中的计算字段调用TabPy服务器以运行假设检验:预订率是否因组而有显着变化?

我有一张桌子,例如:

     Group  Bookings
0        A         1
1        A         0
3998     B         1
3999     B         0

在Python中,在同一台服务器(using the python 2.7 docker image)上,我想要的测试很简单:

from scipy.stats import fisher_exact
df_cont_tbl = pd.crosstab(df['Group'], df['Bookings'])
prop_test = fisher_exact(df_cont_tbl)
print 'Fisher exact test: Odds ratio = {:.2f}, p-value = {:.3f}'.format(*prop_test)

返回:Fisher exact test: Odds ratio = 1.21, p-value = 0.102

我将Tableau连接到TabPy服务器,并且可以执行Hello World计算字段。例如,我返回42并返回计算字段:SCRIPT_REAL("return 42", ATTR([Group]),ATTR([Bookings]) )

但是,我尝试使用计算字段来调用上面的stats函数以提取p值:

SCRIPT_REAL("
import pandas as pd
from scipy.stats import fisher_exact
df_cont_tbl = pd.crosstab(_arg1, _arg2)
prop_test = fisher_exact(df_cont_tbl)
return prop_test[1]
", [Group], [Bookings] )

我收到通知:计算包含错误,并带有下拉菜单在使用表计算功能或来自多个数据源的字段时,所有字段都必须是集合或常数 < / p>

error box

我尝试用ATTR()包装输入,如:

SCRIPT_REAL("
import pandas as pd
from scipy.stats import fisher_exact
df_cont_tbl = pd.crosstab(_arg1, _arg2)
prop_test = fisher_exact(df_cont_tbl)
return prop_test[1]
",ATTR([Group]), ATTR([Bookings])
)

将通知更改为“计算有效”,但从服务器返回Pandas ValueError:

An error occurred while communicating with the External Service.
Error processing script
Error when POST /evaluate: Traceback
Traceback (most recent call last):
File "/opt/conda/envs/Tableau-Python-Server/lib/python2.7/site-packages/tabpy_server/tabpy.py", line 467, in post
result = yield self.call_subprocess(function_to_evaluate, arguments)
File "/opt/conda/envs/Tableau-Python-Server/lib/python2.7/site-packages/tornado/gen.py", line 1008, in run
value = future.result()
File "/opt/conda/envs/Tableau-Python-Server/lib/python2.7/site-packages/tornado/concurrent.py", line 232, in result
raise_exc_info(self._exc_info)
File "/opt/conda/envs/Tableau-Python-Server/lib/python2.7/site-packages/tornado/gen.py", line 1014, in run
yielded = self.gen.throw(*exc_info)
File "/opt/conda/envs/Tableau-Python-Server/lib/python2.7/site-packages/tabpy_server/tabpy.py", line 488, in call_subprocess
ret = yield future
File "/opt/conda/envs/Tableau-Python-Server/lib/python2.7/site-packages/tornado/gen.py", line 1008, in run
value = future.result()
File "/opt/conda/envs/Tableau-Python-Server/lib/python2.7/site-packages/concurrent/futures/_base.py", line 400, in result
return self.__get_result()
File "/opt/conda/envs/Tableau-Python-Server/lib/python2.7/site-packages/concurrent/futures/_base.py", line 359, in __get_result
reraise(self._exception, self._traceback)
File "/opt/conda/envs/Tableau-Python-Server/lib/python2.7/site-packages/concurrent/futures/_compat.py", line 107, in reraise
exec('raise exc_type, exc_value, traceback', {}, locals_)
File "/opt/conda/envs/Tableau-Python-Server/lib/python2.7/site-packages/concurrent/futures/thread.py", line 61, in run
result = self.fn(*self.args, **self.kwargs)
File "<string>", line 5, in _user_script
File "/opt/conda/envs/Tableau-Python-Server/lib/python2.7/site-packages/pandas/tools/pivot.py", line 479, in crosstab
df = DataFrame(data)
File "/opt/conda/envs/Tableau-Python-Server/lib/python2.7/site-packages/pandas/core/frame.py", line 266, in __init__
mgr = self._init_dict(data, index, columns, dtype=dtype)
File "/opt/conda/envs/Tableau-Python-Server/lib/python2.7/site-packages/pandas/core/frame.py", line 402, in _init_dict
return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
File "/opt/conda/envs/Tableau-Python-Server/lib/python2.7/site-packages/pandas/core/frame.py", line 5398, in _arrays_to_mgr
index = extract_index(arrays)
File "/opt/conda/envs/Tableau-Python-Server/lib/python2.7/site-packages/pandas/core/frame.py", line 5437, in extract_index
raise ValueError('If using all scalar values, you must pass'
ValueError: If using all scalar values, you must pass an index
Error type : ValueError
Error message : If using all scalar values, you must pass an index

示例数据集:

要生成我要连接的CSV:

import os
import pandas as pd
import numpy as np
from collections import namedtuple

OUTPUT_LOC = os.path.expanduser('~/TabPy_demo/ab_test_demo_results.csv')

GroupObs = namedtuple('GroupObs', ['name','n','p'])

obs = [GroupObs('A',3000,.10),GroupObs('B',1000,.13)] 
# note true odds ratio = (13/87)/(10/90) = 1.345

np.random.seed(2019)

df = pd.concat( [ pd.DataFrame({'Group': grp.name,
                                'Bookings':  pd.Series(np.random.binomial(n=1, 
                                                            p=grp.p, size=grp.n))
                              }) for grp in obs
                  ],ignore_index=True )

df.to_csv(OUTPUT_LOC,index=False)

1 个答案:

答案 0 :(得分:0)

旧问题,但这也许会帮助其他人。有几个问题。首先是与数据传递到POST /restapi/v1.0/account/403391985008/extension/403391985008/sms Content-Type: multipart/mixed; boundary=Boundary_1_14413901_1361871080888 --Boundary_1_14413901_1361871080888 Content-Type: application/json; charset=UTF-8 Content-Transfer-Encoding: 8bit {"to" :[{"phoneNumber": "+18772004569"},{"phoneNumber": "+18772094569"}], "text" :"hello", "from" :{"phoneNumber": "+18882004237"}} --Boundary_1_14413901_1361871080888 Content-Type: application/octet-stream Content-Disposition: attachment; filename="filename.zip" [Some encoded binary stream here ...] --Boundary_1_14413901_1361871080888-- 的方式有关。 Tableau将值列表传递给Tabpy服务器,因此请将其包装在数组中以解决您遇到的错误。

class node
{
    public:
        int ochance = 3;

        string question;

        string option1;
        int peopleeffectop1;
        int courteffectop1;
        int treasuryeffectop1;

        string option2;
        int peopleeffectop2;
        int courteffectop2;
        int treasuryeffectop2;

        node *next; 
};

class list
{
    private:
        node *head, *tail;

    public:
        list()
        {
            head=NULL;
            tail=NULL;
        }

        void createnode(int value , string q , string ans1 , int ans1ef1 , int ans1ef2, int ans1ef3 , string ans2, int ans2ef1 , int ans2ef2, int ans2ef3  )
        {
            node *temp = new node;

            temp->ochance = value;
            temp->question = q;
            temp->option1 = ans1;
            temp->peopleeffectop1 = ans1ef1;
            temp->courteffectop1  = ans1ef2;
            temp->treasuryeffectop1 = ans1ef3;
            temp->option2 = ans2;
            temp->peopleeffectop2 = ans2ef1;
            temp->courteffectop2  = ans2ef2;
            temp->treasuryeffectop2 = ans2ef3;
            temp->next = NULL;

            if(head == NULL)
            {
                head = temp;
                tail = temp;
                temp = NULL;
            }
            else
            {   
                tail->next = temp;
                tail = temp;
            }
        }

        node getnth(int pos)
        {
            node* tmp = new node;
            tmp = head;
            int i = 0; 

            while(tmp!=NULL)
            {
                if (i=pos)
                {
                    return *tmp;
                }
                i++;
                tmp = tmp->next;
            }
        }

        int getlen()
        {
            node* tmp = new node;
            tmp = head;
            int i = 0; 

            while(tmp!=NULL)
            {
                i++;
                tmp = tmp->next;
            }

            return i;
        }

        void minus(int pos)
        {
            node* tmp = new node;
            tmp = head;
            int i = 0; 

            while(tmp!=NULL)
            {
                if (i=pos)
                {
                    tmp->ochance -=1;
                }
                i++;
                tmp = tmp->next;
            }
        }

        void delete_first()
        {
            node *temp = new node;
            temp = head;
            head = head->next;
            delete temp;
        }

        void delete_last()
        {
            node *current = new node;
            node *previous = new node;
            current = head;
            while(current->next != NULL)
            {
                previous = current;
                current = current->next;    
            }
            tail = previous;
            previous->next = NULL;
            delete current;
        }

        void delete_position(int pos)
        {
            node *current = new node;
            node *previous = new node;
            current = head;
            for(int i = 1; i < pos; i++)
            {
                previous = current;
                current = current->next;
            }
            previous->next = current->next;
        }
};

另一个问题是表格计算的执行方式。您希望向tabpy发送两个信息列表,每个列表只要您的表长即可。在默认情况下,Tableau希望在行级别进行计算,而这将行不通。

我将行数pd.crosstab包含在构建工作簿的csv中,并确保沿着此函数计算python值。

Tableau Call to tabpy

现在,当您将F1放入工作表中时,它将返回与行数相同的P值。一种解决方法是将您的计算包装到另一个计算中,以便仅在第一行中返回该值,并且将其放在您的工作表中。

Wrapper Calculation

现在您可以将其放入工作表中。

Final Desired Output