想要将一个变量相加,同时折叠另一个变量

时间:2014-09-22 04:52:48

标签: r csv sum collapse

这是我的数据集:https://wiki.csc.calpoly.edu/datasets/attachment/wiki/HighwayAccidents/ACCIDENT2007-FullDataSet.csv

我清理了2007年美国和各省(1:56)所有车祸的一些数据,并有一个包含9个变量的大型csv文件,如州,车辆,行人,人员,醉酒驾驶员,死亡人数,日期和时间。 CSV将每个事故列为单独的行。各州以数字方式确定。我想在不总结状态的情况下对每个州的各列进行求和。

我希望得到如下结果:

State        Drunk_Dr
1               345
2              1023

State       Fatalities   Drunk_Dr
34              123        134
35               56         64

等。为1:56

1 个答案:

答案 0 :(得分:0)

library(data.table)

file.in <- "path/to/your/file.csv"
DT.accidents <- fread(file.in)

## Have a look at the different DRUNK_DR values
DT.accidents[, table(DRUNK_DR)]
## Nine?? Really?  

DT.accidents[DRUNK_DR == 9]


## Anyway, to sum up by state and drunk drivers, assuming one row of data is one accident, you can simply use: 

DT.accidents[, .N, by=list(STATE, DRUNK_DR)]


## If you want to ignore cases with zero drunk drivers, filter those out
DT.drunks <- DT.accidents[DRUNK_DR > 0, .N, by=list(STATE, DRUNK_DR)]

## You can reshape it too, if you'd like

library(reshape2)
DT.drunks <- as.data.table(dcast(DT.drunks, STATE ~ DRUNK_DR, value="N"))

添加州名

State Names, accorindg to 
ftp://ftp.nhtsa.dot.gov/FARS/FARS-DOC/USERGUIDE-2007.pdf


## start with the built in variable 'state.name' (no "s")
state_names <- state.name[1:50]
## Add in territories
state_names <- sort(c(state_names, "District of Columbia", "Puerto Rico", "Virgin Islands"))
## Create index numbers that match what is shown in the file
state_number <- setdiff(1:56, c(3, 7, 14))
## Create a data.table for joining
DT.states <- data.table(state_number=state_number, state_names=state_names)

## Join in the info
setkey(DT.states, "state_number")
setkey(DT.accidents, "STATE")
DT.accidents[DT.states, STATE_NAMES := state_names]

## Now you can reshape, but include the names
DT.drunks <- DT.accidents[DRUNK_DR > 0, .N, by=list(STATE, STATE_NAMES, DRUNK_DR)]

## You can reshape it too, if you'd like
DT.drunks <- as.data.table(dcast(DT.drunks, STATE + STATE_NAMES ~ DRUNK_DR, value="N"))

现在......至于那个九醉酒驾驶员意外。

DT.accidents[DRUNK_DR == 9]
谷歌:“蒙大拿州2007年5月19日交通死亡率” 第一个结果导致http://www.city-data.com/accidents/acc-Browning-Montana.html

哪条有这条信息

  

布朗宁2007年致命车祸和道路交通事故(残骸)名单:    2007年5月19日05:55 PM,Us-2,Sr-464,Lat:48.555692,Lon:-113.010247,车辆:1,死亡人数:1,醉酒驾驶员:数据不一致