我正在尝试在R中实现本福德定律。到目前为止,一切都是相应的,除非如果有一些出现0的第一个数字,则会抛出异常:
Error in data.frame(digit = 1:9, actual.count = first_digit_counts, actual.fraction = first_digit_counts/nrow(fraudDetection), :
arguments imply differing number of rows: 9, 5
这是因为对于我当前的数据集,只有从1,2,7,8和9开始的第一个数字。我怎样才能使3,4,5,6的计数为0而不是在表中根本没有出现?
当前数据集
这是引发异常的部分:
first_digit_counts <- as.vector(table(fraudDetection$first.digit))
此代码适合的当前代码如下:
# load the required packages
require(reshape)
require(stringr)
require(plyr)
require(ggplot2)
require(scales)
# load in data from CSV file
fraudDetection <- read.csv("Fraud Case in Arizona 1993.csv")
names(fraudDetection)
# take only the columns containing the counts and manipulate the data into a "long" format with only one value per row
# let's try to compare the amount of the fraudulent transactions against the Benford's Law
fraudDetection <- melt(fraudDetection["Amount"])
# add columns containing the first and last digits, extracted using regular expressions
fraudDetection <- ddply(fraudDetection, .(variable), transform, first.digit = str_extract(value, "[123456789]"), last.digit = str_extract(value, "[[:digit:]]$"))
# compare counts of each actual first digit against the counts predicted by Benford’s Law
first_digit_counts <- as.vector(table(fraudDetection$first.digit))
first_digit_actual_vs_expected <- data.frame(
digit = 1:9,
actual.count = first_digit_counts,
actual.fraction = first_digit_counts / nrow(fraudDetection),
benford.fraction = log10(1 + 1 / (1:9))
)
答案 0 :(得分:6)
为了确保所有数字都在first_digit_counts
中显示,您可以将first.digit
转换为系数明确设置级别,以便它们包含从1到1的所有数字9:
first_digit = c(1, 1, 3, 5, 5, 5, 7, 7, 7, 7, 9)
first_digit_factor = factor(first_digit, levels=1:9) # Explicitly set the levels
这使您的table
来电按预期执行:
> table(first_digit)
first_digit
1 3 5 7 9
2 1 3 4 1
> table(first_digit_factor)
first_digit_factor
1 2 3 4 5 6 7 8 9
2 0 1 0 3 0 4 0 1
> as.vector(table(first_digit_factor))
[1] 2 0 1 0 3 0 4 0 1
答案 1 :(得分:2)
可以从rattle
包
library(rattle)
dummy <- rnorm(100)
calcInitialDigitDistr(dummy, split = "none")
答案 2 :(得分:1)