R-数据框,按col

时间:2017-01-18 16:08:25

标签: r dataframe data-mining

我们创建了一个MOOC课程,其中记录系统记录了所有内容(点击,态度,视频查看等)。 100-150名学生报名参加了这门课程。

作为这项研究的结果,我们得到了一个日志文件(json)。随着R i准备了这个数据帧:

log_data <- ndjson::stream_in("log-export-20160721_1030.json")
dplyr::glimpse(log_data)

Observations: 1,443,817
 Variables: 22
 $ _id.$oid          <chr> "5707a89dcbbb4d92129ee44c", "5707a89...
 $ data              <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
 $ page              <chr> "http://elearning.szte.hu/mod/szte/f...
 $ pid               <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2,...
 $ time              <chr> "2016.04.08. 14:48:24.691", "2016.04...
 $ type              <chr> "load", "mousemove", "mousemove", "m...
 $ user              <chr> "3", "3", "3", "3", "3", "3", "3", "...
 $ data.realDistance <dbl> NA, 0.00000, 366.87055, 241.45600, N...
 $ data.x            <dbl> NA, 139, 176, 261, NA, 245, 1905, 21...
 $ data.xDistance    <dbl> NA, 0, 37, 85, NA, 16, NA, 111, NA, ...
 $ data.y            <dbl> NA, 29, 394, 620, NA, 761, 553, 451,...
 $ data.yDistance    <dbl> NA, 0, 365, 226, NA, 141, NA, 310, N...
 $ data.text         <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
 $ data.top          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
 $ data.target       <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
 $ data.filename     <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
 $ data.length       <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
 $ data.actualTime   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
 $ data.src          <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
 $ data.totalTime    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
 $ data.videoId      <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
 $ data.seekTime     <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, ...

我的问题是:

如何计算用户的日志数量?

  • 示例:用户352创建了1000个日志,但用户152创建了2个日志。

如何按用户对数据表进行分组,拆分或分离?

0 个答案:

没有答案