维基百科上的R图:数字被错误地解释为因素

时间:2018-06-19 15:16:08

标签: r charts statistics wikimedia-commons

我一直在尝试运行this Wikipedia chart脚本来显示美国的失业情况。

数据来自http://download.bls.gov/pub/time.series/ln/ln.data.1.AllDatahttp://download.bls.gov/pub/time.series/ln/ln.series

    cat("Loading table -- might take some time\n");
    u <- read.table("ln.data.1.AllData", header=T, fill=T)
    u$time <- u$year + (as.numeric(u$period) - 1) / 12

    cat("Processing -- might take some time\n");
    u1 = subset(u, series_id == "LNS13025670")
    u2 = subset(u, series_id == "LNS14023621")
    u3 = subset(u, series_id == "LNS14000000")
    u4 = subset(u, series_id == "LNS13327707")
    u5 = subset(u, series_id == "LNS13327708")
    u6 = subset(u, series_id == "LNS13327709")

par(family="Times")
par(bty = "n")
plot(
    0,
    main = "Measurement of unemployment",
    ylim = c(0,18),
    xlim = c(1950, 2010),
    xlab = "Year",
    ylab = "Percentage",
    las = 1
);

grid()

pal = rainbow(8)
lines(value ~ time, u6, col=pal[6])
lines(value ~ time, u5, col=pal[5])
lines(value ~ time, u4, col=pal[4])
lines(value ~ time, u3, col=pal[3])
lines(value ~ time, u2, col=pal[2])
lines(value ~ time, u1, col=pal[1])

legend(
    "topleft",
    rev(c(
        "U1: Percent Of Civilian Labor Force Unemployed 15 Weeks and over",
        "U2: Unemployment Rate - Job Losers",
        "U3: Unemployment Rate",
        "U4: All of U3, plus discouraged workers",
        "U5: All of U4, plus marginally attached workers",
        "U6: All of U5, plus total employed part time for economic reasons"
    )),
    col = rev(pal[1:6]),
    bty = 'n',
    lty = 1
)

dev.copy(svg, "US Unemployment measures.svg", width=8, height=6)
dev.off()

尽管是Wikimedia Commons的未经修改的源代码,但这些行都是伪造的:

PNG

R脚本出了什么问题?

是因为u1-u6被错误地解释为因素吗?

1 个答案:

答案 0 :(得分:1)

仅浏览原始数据,您的代码就有问题:

u$time <- u$year + (as.numeric(u$period) - 1) / 12

但是期间列的值类似'M01','M02','Q01',Q02'。由于该列包含字符,因此read.table默认将其转换为因子(可以将其关闭)。在类似“ M01”之类的地方调用as.numeric只会返回因子的数字或序数值。