minutes count of tweets
1 100
2 34
3 56
4 234
5 2310
6 345
7 56
8 55
9 12
10 245
这场比赛有130分钟,我怎样每分钟使用推文ID查找推文数?
预期结果:
SyntaxError: Unexpected token # in JSON at position 0
at Object.parse (native)
at createStrictSyntaxError (/Users/lorenzowhite/Desktop/Work Stuff/projectus/node_modules/body-parser/lib/types/json.js:157:10)
at parse (/Users/lorenzowhite/Desktop/Work Stuff/projectus/node_modules/body-parser/lib/types/json.js:83:15)
at /Users/lorenzowhite/Desktop/Work Stuff/projectus/node_modules/body-parser/lib/read.js:121:18
at invokeCallback (/Users/lorenzowhite/Desktop/Work Stuff/projectus/node_modules/raw-body/index.js:224:16)
at done (/Users/lorenzowhite/Desktop/Work Stuff/projectus/node_modules/raw-body/index.js:213:7)
at IncomingMessage.onEnd (/Users/lorenzowhite/Desktop/Work Stuff/projectus/node_modules/raw-body/index.js:273:7)
at emitNone (events.js:86:13)
at IncomingMessage.emit (events.js:185:7)
at endReadableNT (_stream_readable.js:974:12)
at _combinedTickCallback (internal/process/next_tick.js:80:11)
at process._tickCallback (internal/process/next_tick.js:104:9)
SyntaxError: Unexpected token # in JSON at position 0
at Object.parse (native)
at createStrictSyntaxError (/Users/lorenzowhite/Desktop/Work Stuff/projectus/node_modules/body-parser/lib/types/json.js:157:10)
at parse (/Users/lorenzowhite/Desktop/Work Stuff/projectus/node_modules/body-parser/lib/types/json.js:83:15)
at /Users/lorenzowhite/Desktop/Work Stuff/projectus/node_modules/body-parser/lib/read.js:121:18
at invokeCallback (/Users/lorenzowhite/Desktop/Work Stuff/projectus/node_modules/raw-body/index.js:224:16)
at done (/Users/lorenzowhite/Desktop/Work Stuff/projectus/node_modules/raw-body/index.js:213:7)
at IncomingMessage.onEnd (/Users/lorenzowhite/Desktop/Work Stuff/projectus/node_modules/raw-body/index.js:273:7)
at emitNone (events.js:86:13)
at IncomingMessage.emit (events.js:185:7)
at endReadableNT (_stream_readable.js:974:12)
at _combinedTickCallback (internal/process/next_tick.js:80:11)
at process._tickCallback (internal/process/next_tick.js:104:9)
SyntaxError: Unexpected token # in JSON at position 0
at Object.parse (native)
at createStrictSyntaxError (/Users/lorenzowhite/Desktop/Work Stuff/projectus/node_modules/body-parser/lib/types/json.js:157:10)
at parse (/Users/lorenzowhite/Desktop/Work Stuff/projectus/node_modules/body-parser/lib/types/json.js:83:15)
at /Users/lorenzowhite/Desktop/Work Stuff/projectus/node_modules/body-parser/lib/read.js:121:18
at invokeCallback (/Users/lorenzowhite/Desktop/Work Stuff/projectus/node_modules/raw-body/index.js:224:16)
at done (/Users/lorenzowhite/Desktop/Work Stuff/projectus/node_modules/raw-body/index.js:213:7)
at IncomingMessage.onEnd (/Users/lorenzowhite/Desktop/Work Stuff/projectus/node_modules/raw-body/index.js:273:7)
at emitNone (events.js:86:13)
at IncomingMessage.emit (events.js:185:7)
at endReadableNT (_stream_readable.js:974:12)
at _combinedTickCallback (internal/process/next_tick.js:80:11)
at process._tickCallback (internal/process/next_tick.js:104:9)
答案 0 :(得分:0)
假设推文ID是唯一的并使用Pyspark和raw rdd:
rdd = sc.parallelize([(1001 ,145678, 145600, 145730),
(1002 ,145678, 145600, 145730),
(1005 ,145680, 145600, 145730),
(12278 ,145687, 145600, 145730),
(765558 ,145688, 145600, 145730),
(724323 ,145689, 145600, 145730),
(875857 ,145688, 145600, 145730),
(79375 ,145685, 145600, 145730),
(84666 ,145686, 145600, 145730),
(335556 ,145687, 145600, 145730),
(29990 ,145688, 145600, 145730),
(56 ,145689, 145600, 145730),
(968867 ,145690, 145600, 145730),
(8452 ,145691, 145600, 145730),
(1334 ,145679, 145600, 145730) ])
result_dict = rdd.filter(lambda x: x[2] <= x[1] <= x[3]).map(lambda x: (x[1] - x[2], 0))\
.countByKey()
print "minutes count of tweets"
for i in sorted(result_dict.iteritems()):
print "{0}\t{1}".format(i[0], i[1])
结果:
minutes count of tweets
78 2
79 1
80 1
85 1
86 1
87 2
88 3
89 2
90 1
91 1