JavaPackage对象不可调用错误:Pyspark

时间:2016-05-11 05:37:27

标签: apache-spark pyspark python-3.4 apache-zeppelin py4j

像dataframe.show(),sQLContext.read.json这样的操作工作正常,但是大多数函数都提供了#34; JavaPackage对象不可调用错误" 。 例如:当我做的时候

        static CancellationTokenSource _done;
        ImapClient _imap;
        protected void Application_Start(object sender, EventArgs e)
        {
            var worker = new BackgroundWorker();
            worker.DoWork += new DoWorkEventHandler(StartIdleProcess);

            if (worker.IsBusy)
                worker.CancelAsync();

            worker.RunWorkerAsync();
        }

        private void StartIdleProcess(object sender, DoWorkEventArgs e)
        {
            _imap = new ImapClient();

            _imap.Connect(ConfigurationManager.AppSettings["IncomingServerName"], Convert.ToInt32(ConfigurationManager.AppSettings["InPort"]), Convert.ToBoolean(ConfigurationManager.AppSettings["IncomingIsSSL"]));
            _imap.AuthenticationMechanisms.Remove("XOAUTH");
            _imap.Authenticate(ConfigurationManager.AppSettings["EmailAddress"], ConfigurationManager.AppSettings["Password"]);

            _imap.Inbox.Open(FolderAccess.ReadWrite);
            _imap.Inbox.MessagesArrived += Inbox_MessagesArrived;
            _done = new CancellationTokenSource();
            _imap.Idle(_done.Token);
        }
        static void Inbox_MessagesArrived(object sender, EventArgs e)
        {
            var folder = (ImapFolder)sender;
            //_done.Cancel(); // Stop idle process
            using (var client = new ImapClient())
            {
                client.Connect(ConfigurationManager.AppSettings["IncomingServerName"], Convert.ToInt32(ConfigurationManager.AppSettings["InPort"]), Convert.ToBoolean(ConfigurationManager.AppSettings["IncomingIsSSL"]));

                // disable OAuth2 authentication unless you are actually using an access_token
                client.AuthenticationMechanisms.Remove("XOAUTH2");

                client.Authenticate(ConfigurationManager.AppSettings["EmailAddress"], ConfigurationManager.AppSettings["Password"]);

                int tmpcnt = 0;
                client.Inbox.Open(FolderAccess.ReadWrite);
                foreach (var uid in client.Inbox.Search(SearchQuery.NotSeen))
                {
                    try
                    {
                        var message = client.Inbox.GetMessage(uid);
                        client.Inbox.SetFlags(uid, MessageFlags.Seen, true);

                        List<byte[]> listAttachment = new List<byte[]>();

                        if (message.Attachments.Count() > 0)
                        {
                            foreach (var objAttach in message.Attachments)
                            {
                                using (MemoryStream ms = new MemoryStream())
                                {
                                    ((MimeKit.ContentObject)(((MimeKit.MimePart)(objAttach)).ContentObject)).Stream.CopyTo(ms);
                                    byte[] objByte = ms.ToArray();
                                    listAttachment.Add(objByte);
                                }
                            }
                        }

                        string subject = message.Subject;
                        string text = message.TextBody;
                        var hubContext = GlobalHost.ConnectionManager.GetHubContext<myHub>();
                        hubContext.Clients.All.modify("fromMail", text);
                        tmpcnt++;
                    }
                    catch (Exception)
                    { }
                }
                client.Disconnect(true);
            }
        }

我收到错误

dataFrame.withColumn(field_name, monotonically_increasing_id())

我正在使用apache-zeppelin解释器并将py4j添加到python路径。

当我这样做时

File "/tmp/spark-cd423f35-9572-45ee-b159-1b2732afa2a6/userFiles-3a6e1729-95f4-468b-914c-c706369bf2a6/Transformations.py", line 64, in add_id_column
    self.dataFrame = self.dataFrame.withColumn(field_name, monotonically_increasing_id())
  File "/home/himaprasoon/apps/spark-1.6.0-bin-hadoop2.6/python/pyspark/sql/functions.py", line 347, in monotonically_increasing_id
    return Column(sc._jvm.functions.monotonically_increasing_id())
TypeError: 'JavaPackage' object is not callable

导入成功

import py4j
print(dir(py4j))

当我尝试

['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', 'compat', 'finalizer', 'java_collections', 'java_gateway', 'protocol', 'version']

在pyspark shell中打印

print(sc._jvm.functions)

但是当我在我的翻译中尝试这个时,会打印

<py4j.java_gateway.JavaClass object at 0x7fdaf9727ba8>

1 个答案:

答案 0 :(得分:1)

In zeppelin interpreter code

java_import(gateway.jvm, "org.apache.spark.sql.*")

was not getting executed. Adding this to the import fixed the issue