我正在尝试访问parseApacheLogLine变量,但我在下面的代码中遇到错误
logFile = "NASAlog.txt"
def parseLogs():
global arseApacheLogLine
parsed_logs=(sc
.textFile(logFile)
.map(parseApacheLogLine)
.cache())
access_logs = (parsed_logs
.filter(lambda s: s[1] == 1)
.map(lambda s: s[0])
.cache())
failed_logs = (parsed_logs
.filter(lambda s: s[1] == 0)
.map(lambda s: s[0]))
failed_logs_count = failed_logs.count()
if failed_logs_count > 0:
print 'Number of invalid logline: %d' % failed_logs.count()
for line in failed_logs.take(20):
print 'Invalid logline: %s' % line
print 'Read %d lines, successfully parsed %d lines, failed to parse %d lines' % (parsed_logs.count(), access_logs.count(), failed_logs.count())
return parsed_logs, access_logs, failed_logs
parsed_logs, access_logs, failed_logs = parseLogs()
预期结果:配置和初始RDD创建
NameError Traceback(最近一次调用 最后)in() 26返回parsed_logs,access_logs,failed_logs 27 ---> 28 parsed_logs,access_logs,failed_logs = parseLogs()
parseLogs中的() 6 parsed_logs =(sc 7 .textFile(logFile) ----> 8 .map(parseApacheLogLine) 9 .cache()) 10
NameError:未定义全局名称'parseApacheLogLine'