I have an input data looks something like this and want process this data using PIG script.
USER_ID CLICK_NO PAGE_NAME CLICK_TIME
1 1 PAGE1 <time from epoch as long>
1 2 PAGE2 <time from epoch as long>
1 3 PAGE3 <time from epoch as long>
Here, I am getting user id and time when he/she clicked on each link on a website. I wanted to find total time he/she spent on the website. In short, I wanted to group by user id, and sort by CLICK_NO which is easy, but then I do not know if I can access next row and find different between two clicks. If I can do that, then I can find sum of all difference in time to find total time spent on the site. Can someone help?
I can post code snippet but it is pretty straight forward to group by USER_ID and order by CLICK_NO.
答案 0 :(得分:0)
按MAX(click_time) - MIN(click_time)
分组后,差异总和等于user_id
。猪有这方面的功能。
https://pig.apache.org/docs/r0.15.0/func.html#max https://pig.apache.org/docs/r0.15.0/func.html#min