我正试图从Pig调用我自己的两个模块。
这里是module_one.py:
import sys
print sys.path
def foo():
pass
这里是module_two.py:
from module_one import foo
def bar():
foo()
我把它们都变成了s3。
这是我在尝试将它们导入Pig时所得到的:
2015-06-14 12:12:10,578 [main] INFO org.apache.pig.Main - Apache Pig version 0.12.0-amzn-2 (rexported) compiled May 05 2015, 19:03:23
2015-06-14 12:12:10,579 [main] INFO org.apache.pig.Main - Logging error messages to: /mnt/var/log/apps/pig.log
2015-06-14 12:12:10,620 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /home/hadoop/.pigbootup not found
2015-06-14 12:12:11,277 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2015-06-14 12:12:11,279 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-06-14 12:12:11,279 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://1.1.1.1:9000
2015-06-14 12:12:12,794 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
grunt> REGISTER 's3://mybucket/pig/module_one.py' USING jython AS m1;
2015-06-14 12:12:15,177 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-06-14 12:12:17,457 [main] INFO com.amazon.ws.emr.hadoop.fs.EmrFileSystem - Consistency disabled, using com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem as filesystem implementation
2015-06-14 12:12:17,889 [main] INFO amazon.emr.metrics.MetricsSaver - MetricsConfigRecord disabledInCluster: false instanceEngineCycleSec: 60 clusterEngineCycleSec: 60 disableClusterEngine: false maxMemoryMb: 3072 maxInstanceCount: 500
2015-06-14 12:12:17,889 [main] INFO amazon.emr.metrics.MetricsSaver - Created MetricsSaver j-5G45FR7N987G:i-a95a5379:RunJar:03073 period:60 /mnt/var/em/raw/i-a95a5379_20150614_RunJar_03073_raw.bin
2015-06-14 12:12:18,633 [main] INFO com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem - Opening 's3://mybucket/pig/module_one.py' for reading
2015-06-14 12:12:18,661 [main] INFO amazon.emr.metrics.MetricsSaver - Thread 1 created MetricsLockFreeSaver 1
2015-06-14 12:12:18,743 [main] INFO org.apache.pig.scripting.jython.JythonScriptEngine - created tmp python.cachedir=/tmp/pig_jython_4599752347759040376
2015-06-14 12:12:21,060 [main] WARN org.apache.pig.scripting.jython.JythonScriptEngine - pig.cmd.args.remainders is empty. This is not expected unless on testing.
['/home/hadoop/.versions/pig-0.12.0-amzn-2/lib/Lib', '/home/hadoop/.versions/pig-0.12.0-amzn-2/lib/jython-standalone-2.5.3.jar/Lib', 'classpath', 'pyclasspath/', '/home/hadoop']
2015-06-14 12:12:21,142 [main] INFO org.apache.pig.scripting.jython.JythonScriptEngine - Register scripting UDF: m1.foo
grunt> REGISTER 's3://mybucket/pig/module_two.py' USING jython AS m2;
2015-06-14 12:12:33,870 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-06-14 12:12:33,918 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-06-14 12:12:34,020 [main] INFO com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem - Opening 's3://mybucket/pig/module_two.py' for reading
2015-06-14 12:12:34,064 [main] WARN org.apache.pig.scripting.jython.JythonScriptEngine - pig.cmd.args.remainders is empty. This is not expected unless on testing.
2015-06-14 12:12:34,621 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1121: Python Error. Traceback (most recent call last):
File "/tmp/pig1436120267849453375tmp/module_two.py", line 1, in
from module_one import foo
ImportError: No module named module_one
Details at logfile: /mnt/var/log/apps/pig.log
我试过了:
通常的sys.path.append('./Lib')
和sys.path.append('.')
,没有帮助
使用sys.path.append(os.path.dirname(__file__))
黑客攻击文件夹位置但获得NameError: name '__file__' is not defined
创建__init__.py
并使用REGISTER加载
sys.path.append('s3://mybucket/pig/')
也没有。
我使用Apache Pig version 0.12.0-amzn-2
,因为那是唯一一个现在可以选择的人。{/ p>
答案 0 :(得分:0)
您要将第一个python udf导入为func tabBarController(tabBarController: UITabBarController, didSelectViewController viewController: UIViewController) {
if tabBarController.selectedIndex == 0 {
let navigationController = viewController as? UINavigationController
navigationController?.popToRootViewControllerAnimated(true)
}
}
,因此您应该使用m1
访问其命名空间,而不是m1.foo()
。
编辑:第二个python文件应为:
module_one
我刚刚在亚马逊EMR上进行了测试,它确实有效。
答案 1 :(得分:0)
基于我在这里找到的内容:How do I get the path and name of the file that is currently executing?,我设法通过执行以下操作来注册包含我想要在Pig UDF中加载的自定义模块的路径:
import inspect, os, sys
sys.path.append(os.path.dirname(os.path.abspath(inspect.stack()[0][1])))
import myModule
因此,如果您的module_two与它包含的module_one所在的文件夹相同,那么这应该适用于Pig。