例如,如果句子包含" John"和"驱动"这意味着约翰有一辆车并开车去上班。我附上了我用来做代码的代码。但是,代码无法正常工作且过于复杂。我将非常感谢你的帮助。
[Authorize(Roles="whateverrole")]
答案 0 :(得分:0)
我这样做:
import socket
class SparkUtil(object):
@staticmethod
def get_spark_context (host, venv, framework_name, parts):
os.environ['PYSPARK_PYTHON'] = "{0}/bin/python".format (venv)
from pyspark import SparkConf, SparkContext
from StringIO import StringIO
ip = socket.gethostbyname(socket.gethostname())
sparkConf = (SparkConf()
.setMaster(host)
.setAppName(framework_name))
return SparkContext(conf = sparkConf)
input_txt = [
[ "John", "John usually drives to work. He usually gets up early and drinks coffee. Mary usually joining him." ],
[ "Sam", "As opposed to John, Sam doesn't like to drive. Sam usually walks there." ],
[ "Mary", "Mary doesn't have driving license. Mary usually coming with John which picks her up from home." ]
]
def has_car (text):
return "drives" in text
def get_method (text):
method = None
for m in [ "drives", "walks", "coming with" ]:
if m in text:
method = m
break
return method
def process_row (row):
return [ row[0], has_car(row[1]), get_method(row[1]) ]
sc = SparkUtil.get_spark_context (host = "local[2]",
venv = "../starshome/venv",
framework_name = "app",
parts = 2)
print (sc.parallelize (input_txt).map (process_row).collect ())
您可以忽略的SparkUtil类。我没有使用笔记本电脑。这只是一个直接的Spark应用程序。