我正在处理通过市长办公室网站获得的洛杉矶警方数据。从2017年至2018年,我正在尝试查看市议会第5区的费用和每项特定费用的金额。CHARGE
和CITY_COUNCIL_DIST
是我要查看的两个变量/列。 / p>
我使用table(ArrestData$CHARGE)
来计算不同值的数量。
我意识到有2400多个唯一条目,因此大部分条目都被省略了。我想知道是否有代码可以查看LAPD大部分给出的5种“收费”。
此外,我正在尝试在一个特定的Council District
(还是另一个变量/列)中找到前5个费用,是否有相应的代码?
除了:
如何在我的帖子中添加示例数据?在RStudio上要执行哪些步骤?
有人在上一篇文章中要求我执行此操作,但是我不确定如何执行此操作。他们告诉我使用dput(head(df,n))
,但即使使用10行,我的数据也太大。他们告诉我通过RScript做到这一点,但我不确定他们的意思
答案 0 :(得分:0)
我认为使用聚合函数可能会有所帮助。如果您的数据只是CHARGE和CITY_COUNCIL_DIST,那么代码可能看起来像这样:
aggregate(.~CITY_COUNCIL_DIST + CHARGE, ArrestData, count)
我在R方面还不是很先进,因此代码可能需要对您的实际数据进行一些调整。获得汇总后,您可以订购数据:
agg.data[order(agg.data, descending=TRUE),]
我真的对dput没有帮助,对不起!
答案 1 :(得分:0)
发布对实际数据集/样本数据的引用将有助于创建解决方案。这将有助于该帖子遵守其他人提到的可重复性标准。为了这个示例,我们将显式创建一个数据集。
ArrestData <- data.frame(
CHARGE=c("CHARGEA","CHARGEA","CHARGEA","CHARGEA","CHARGEA","CHARGEA","CHARGEA","CHARGEA","CHARGEA",
"CHARGEA","CHARGEA","CHARGEA","CHARGEA","CHARGEA","CHARGEA","CHARGEA","CHARGEA","CHARGEA",
"CHARGEB","CHARGEB","CHARGEB","CHARGEB","CHARGEB","CHARGEB","CHARGEB","CHARGEB",
"CHARGEB","CHARGEB","CHARGEB","CHARGEB","CHARGEB","CHARGEB","CHARGEB","CHARGEB",
"CHARGEC","CHARGEC","CHARGEC","CHARGEC","CHARGEC","CHARGEC","CHARGEC",
"CHARGEC","CHARGEC","CHARGEC","CHARGEC","CHARGEC","CHARGEC","CHARGEC",
"CHARGED","CHARGED","CHARGED","CHARGED","CHARGED","CHARGED",
"CHARGED","CHARGED","CHARGED","CHARGED","CHARGED","CHARGED",
"CHARGEE","CHARGEE","CHARGEE","CHARGEE","CHARGEE",
"CHARGEE","CHARGEE","CHARGEE","CHARGEE","CHARGEE",
"CHARGEF","CHARGEF","CHARGEF","CHARGEF",
"CHARGEF","CHARGEF","CHARGEF","CHARGEF",
"CHARGEG","CHARGEG","CHARGEG",
"CHARGEG","CHARGEG","CHARGEG",
"CHARGEH","CHARGEH",
"CHARGEH","CHARGEH",
"CHARGEI",
"CHARGEI"
),
CITY_COUNCIL_DIST=c(0,5)
)
假设您的数据集命名为ArrestData
,并且您的CHARGE
/ CITY_COUNCIL_DIST
也按照所述命名,则此代码应该可以工作。以下代码将包含所有CHARGE
的前CITY_COUNCIL_DIST
的前5个CITY_COUNCIL_DIST
。
#install these packages if you do not have them
install.packages("magrittr")
install.packages("dplyr")
#make sure these libraries are present
library(magrittr)
library(dplyr)
ArrestData %>%
group_by(CHARGE, CITY_COUNCIL_DIST) %>%
summarize(count=n()) %>%
arrange(CITY_COUNCIL_DIST, desc(count)) %>%
group_by(CITY_COUNCIL_DIST) %>%
mutate(rank = rank(desc(count), ties.method="min")) %>%
filter(rank<=5)
为了仅过滤出CITY_COUNCIL_DIST
5的结果,您需要将filter
语句更改为如下所示:(取决于您的CITY_COUNCIL_DIST
实际值是多少)
filter(rank<=5, CITY_COUNCIL_DIST==5)