汇总数据框r

时间:2017-06-11 16:06:48

标签: r group-by mapping dplyr summarize

需要你最好的建议。试图在纽约州绘制自行车路线。

library(tidyverse)
bikes <- read.csv("August.csv", header = TRUE)
str(bikes) # 1557663 obs. of  15 variables
summary(bikes)
names(bikes)

这就是一条路线的样子

# Sample route (example)
route(from = "Clark St & Henry St, New York, NY", to = "Queens Plaza North & 
Crescent St, New York, NY")
rt <- route(from = "Clark St & Henry St, New York, NY", to = "Queens Plaza 
North & Crescent St, New York, NY")
nyc <- qmap("New York, NY", color = 'bw', zoom = 12)  
nyc + geom_path(aes(x = rt$startLon, y = rt$startLat), 
            colour = "red", data = rt, alpha = 1, size = 0.2)

# How many stations are unique?
start.station <- bikes$start.station.name
unique(start.station) # 574 stations
end.station <- bikes$end.station.name
unique(end.station) # 582 stations

names(bikes)
# [1] "tripduration"            "starttime"               "stoptime"               
# [4] "start.station.id"        "start.station.name"      
# "start.station.latitude" 
# [7] "start.station.longitude" "end.station.id"          "end.station.name"       
# [10] "end.station.latitude"    "end.station.longitude"   "bikeid"                 
# [13] "usertype"                "birth.year"              "gender"  

我可以假设我只需要两列 - 用于起始和终点站名称。

# eliminate all columns besides two - start and end stations
only.stations <- bikes %>% as_tibble() %>% 
mutate(tripduration = NULL, starttime = NULL, stoptime = NULL, 
start.station.id = NULL,
start.station.latitude = NULL, start.station.longitude = NULL, 
end.station.id = NULL,
end.station.latitude = NULL, end.station.longitude = NULL, bikeid = NULL, 
usertype = NULL, 
birth.year = NULL, gender = NULL)

only.stations # A tibble: 1,557,663, so, we have 1,557,663 rides
# start.station.name          end.station.name
# <fctr>                    <fctr>
#1               Avenue D & E 3 St            E 3 St & 1 Ave
#2              Broadway & E 14 St         E 7 St & Avenue A
#3  Metropolitan Ave & Bedford Ave       Union Ave & N 12 St
#4                 E 10 St & 5 Ave           E 10 St & 5 Ave
#5           LaGuardia Pl & W 3 St            E 3 St & 1 Ave
#6         Grand St & Havemeyer St Graham Ave & Conselyea St
#7           N 12 St & Bedford Ave  Bedford Ave & Nassau Ave
#8                 9 Ave & W 18 St     Pershing Square North
#9                  E 2 St & 2 Ave         E 2 St & Avenue C
#10   MacDougal St & Washington Sq        E 10 St & Avenue A
# ... with 1,557,653 more rows
# unique(only.stations) # A tibble: 129,839 × 2 - so, do we have 129,839 
unique (only.stations)
View(only.stations)

我的问题 - 如何对129,839个唯一行进行分组和汇总,并了解每条路径的使用频率。我相信它是与dplyr - group_by()和summarize(),但尝试了几个选项,没有任何作用。 :(

此致 奥雷克

1 个答案:

答案 0 :(得分:1)

看起来你的问题是关于计算kable(output, "latex", booktabs = TRUE, longtable = TRUE, caption = "Test") %>% kable_styling(latex_options = c("hold_position", "repeat_header")) 中每个唯一行的频率。您遗漏的关键字是only.stations n()函数中的dplyr。尝试:

summarise