我正在尝试创建一个包含两列的向量,其中包含以下字符串,前提是BOTH列中的数据为true。我尝试过,但没有成功:
CrimesAndLocation <- table(c(Crimes_Data$Primary.Type=="ARSON","ASSAULT","BATTERY","BURGLARY","HOMICIDE","HUMAN TRAFFICKING","KIDNAPPING","ROBBERY",Crimes_Data$Location.Description=="RESIDENCE")))
我正在尝试输出:
Primary.Type
,是上面列出的8个特定重罪之一。因此,它不应该显示所有32个可能的重罪,只是在上面列出的8个Location.Description
,RESIDENCE
这是我正在尝试做的目标:
COLUMN 1 COLUMN 2
"ARSON" "RESIDENCE"
"KIDNAPPING" "RESIDENCE"
"BATTERY" "RESIDENCE"
"HOMICIDE" "RESIDENCE"
"ASSAULT" "RESIDENCE"
...
更新:> str(Crimes_Data)
:
'data.frame': 293036 obs. of 22 variables:
$ ID : int 10248194 10251162 10248198 10248242 10248228 10248223 10248192 10248157 10249529 10252453 ...
$ Case.Number : Factor w/ 293015 levels "F218264","HA168845",..: 292354 292350 292363 292359 292368 292366 292351 292348 292364 292816 ...
$ Date : Factor w/ 124573 levels "01/01/2015 01:00:00 AM",..: 94544 94542 94539 94536 94535 94535 94535 94535 94529 94528 ...
$ Block : Factor w/ 27983 levels "0000X E 100TH PL",..: 13541 7650 22635 1317 13262 9623 12854 8232 24201 14279 ...
$ IUCR : Factor w/ 334 levels "0110","0130",..: 49 139 321 33 251 82 38 282 97 38 ...
$ Primary.Type : Factor w/ 32 levels "ARSON","ASSAULT",..: 3 7 24 3 18 31 3 13 17 3 ...
$ Description : Factor w/ 313 levels "$500 AND UNDER",..: 111 281 119 35 131 1 260 193 274 260 ...
$ Location.Description: Factor w/ 121 levels "","ABANDONED BUILDING",..: 95 19 110 48 97 110 106 110 110 99 ...
$ Arrest : Factor w/ 2 levels "false","true": 1 1 2 1 2 2 1 2 2 1 ...
$ Domestic : Factor w/ 2 levels "false","true": 2 1 1 1 1 1 1 1 1 1 ...
$ Beat : int 835 333 733 634 1121 1432 1024 735 414 2535 ...
$ District : int 8 3 7 6 11 14 10 7 4 25 ...
$ Ward : int 18 5 6 21 27 1 22 17 7 26 ...
$ Community.Area : int 70 43 68 49 23 22 30 67 46 23 ...
$ FBI.Code : Factor w/ 26 levels "01A","01B","02",..: 11 17 26 6 21 8 11 25 9 11 ...
$ X.Coordinate : int 1154209 1190610 1172166 1176493 1153156 1159961 1154332 1163770 1193570 NA ...
$ Y.Coordinate : int 1852321 1856955 1858813 1841948 1904451 1915955 1887190 1857568 1852889 NA ...
$ Year : int 2015 2015 2015 2015 2015 2015 2015 2015 2015 2015 ...
$ Updated.On : Factor w/ 442 levels "01/01/2015 12:39:07 PM",..: 288 288 288 288 288 288 288 288 288 288 ...
$ Latitude : num 41.8 41.8 41.8 41.7 41.9 ...
$ Longitude : num -87.7 -87.6 -87.6 -87.6 -87.7 ...
$ Location : Factor w/ 173646 levels "","(41.644604096, -87.610728247)",..: 31318 40835 45858 15601 116871 140063 84837 42961 32176 1 ...
答案 0 :(得分:1)
这对dplyr
包很好。 filter
函数将根据您提供的任意数量的逻辑表达式过滤数据框。以下内容适用于您:
library(dplyr)
filter(
Crimes_Data,
Primary.Type %in% c("ARSON", "ASSAULT", "BATTERY",
"BURGLARY", "HOMICIDE", "HUMAN TRAFFICKING",
"KIDNAPPING", "ROBBERY"),
Location.Description == "RESIDENCE"
)
如果你不想使用dplyr
,你可以用基本R的老式方式来做,如下所示:
type.bool <- Crimes_Data$Primary.Type %in% c("ARSON", "ASSAULT", "BATTERY",
"BURGLARY", "HOMICIDE",
"HUMAN TRAFFICKING", "KIDNAPPING",
"ROBBERY")
location.bool <- Crimes_Data$Location.Description == "RESIDENCE"
Crimes_Data[type.bool & location.bool, ]
[
子集运算符可以取一个布尔向量,而不是索引的整数向量。在这种情况下,它只返回数据帧的行,布尔向量的相应元素为TRUE
。
答案 1 :(得分:0)
感谢str()
又名&#34;结构&#34;输出更新,它使您能够更清楚地帮助您。
获取观察列表
RESIDENCE
尝试将任务分解为更小的部分:
第1步:
ViolentCrimes = subset(Crimes_Data, Primary.Type == "ARSON" | Primary.Type == "ASSAULT" | Primary.Type == "BATTERY" | Primary.Type == "BURGLARY" | Primary.Type == "HOMICIDE" | Primary.Type == "HUMAN TRAFFICKING" | Primary.Type == "KIDNAPPING" | Primary.Type == "ROBBERY")
第2步:
ViolentCrimesResidence = subset(ViolentCrimes, Location.Description == "RESIDENCE", select = c(Primary.Type, Location.Description))
结果:
ViolentCrimesResidence
包含两列,第1列是Primary.Type的列表,第2列是Location.Description,其中第1列只有8个重要的重罪和第2列的值#34; RESIDENCE&# 34;
第1步:
来自R网站的关于subset
and OR
condition的例子:
PineTreeGrade3Data<-subset(StudentData, SchoolName=="Pine Tree Elementary" | Grade==3)
我们有:
ViolentCrimes = subset(Crimes_Data, Primary.Type == "ARSON" |
subset()
函数Crimes_Data
是现有数据框作为输入, in this case
Primary.Type ==&#34; ARSON&#34;`|
符号编写。所以我们反复使用它来包括其他每个重要的重罪=
与<-
同义,并将此子集结果保存到我们称为ViolentCrimes
的新数据框中。 =
因为键入的击键次数少于<-
,或者是正确的第2步:
ViolentCrimesResidence = subset(ViolentCrimes, Location.Description == "RESIDENCE", select = c(Primary.Type, Location.Description))
ViolentCrimes
数据框仅包含8个暴力犯罪,8个重罪&#34; ARSON&#34;,&#34; ASSAULT&#34; ... Location.Description == "RESIDENCE"
subset()
的另一个选项是select = ...
选项select = c(Variable1, Variable2)
只选择Primary.Type
和Location.Description
向量, select ...
选项ViolentCrimesResidence
所以,现在你在R:
ViolentCrimesResidence
您会看到您想要的八个重要输出的两列输出,发生在RESIDENCE
。