时间序列图在R中再现不正确的值

时间:2016-06-02 23:11:10

标签: r time-series

我正在尝试绘制此处找到的数据集的简单时间序列图:https://datamarket.com/data/set/22qf/monthly-champagne-sales-in-1000s-p273-montgomery-fore-ts#!ds=22qf&display=line

以下是我正在使用的代码的代码:

>setwd("~/Desktop")
>sales<- read.csv("~/Desktop/monthly-champagne-sales-in-1000s.csv", header=FALSE)
>attach(sales)
>msale<-ts(sales, frequency=12, start=c(1950,1))
>plot(msale)
>plot<-ts(V1,V2)

我尝试绘制下面的时间序列的两次尝试都失败了,因为销售栏显示200-5000区域的销售收益率。在我尝试绘制上面的时间序列时,R打印出5-80之间的值。我发现销售列数据集出了问题,所以当我在控制台中打印以下内容时

>View(sales$V2)

结果产生了这个:

structure(c(23L, 20L, 22L, 21L, 27L, 31L, 16L, 15L, 25L, 60L, 81L, 88L, 18L, 17L, 30L, 37L, 46L, 35L, 29L, 13L, 41L, 61L, 84L, 91L, 33L, 28L, 53L, 39L, 48L, 51L, 36L, 8L, 40L, 76L, 89L, 92L, 79L, 32L, 44L, 63L, 64L, 65L, 43L, 9L, 70L, 80L, 90L, 2L, 42L, 59L, 55L, 54L, 67L, 71L, 50L, 11L, 75L, 86L, 95L, 4L, 52L, 49L, 62L, 57L, 73L, 69L, 39L, 14L, 78L, 85L, 3L, 7L, 19L, 24L, 38L, 45L, 26L, 51L, 56L, 12L, 77L, 83L, 93L, 6L, 47L, 34L, 58L, 68L, 74L, 72L, 66L, 10L, 82L, 87L, 94L, 5L), .Label = c("", "10651", "10803", "11331", "12670", "13076", "13916", "1573", "1643", "1659", "1723", "1738", "1759", "1821", "2212", "2282", "2475", "2541", "2639", "2672", "2721", "2755", "2851", "2899", "2922", "2927", "2946", "3006", "3028", "3031", "3036", "3088", "3113", "3162", "3230", "3260", "3266", "3370", "3523", "3528", "3595", "3633", "3663", "3718", "3740", "3776", "3934", "3937", "3957", "3965", "3986", "4016", "4047", "4121", "4154", "4217", "4276", "4286", "4292", "4301", "4474", "4510", "4514", "4520", "4539", "4633", "4647", "4676", "4677", "4739", "4753", "4874", "4968", "5010", "5048", "5211", "5221", "5222", "5375", "5428", "5764", "5951", "6424", "6838", "6873", "6922", "6981", "7132", "7614", "8314", "8357", "9254", "9842", "9851", "9858", "Monthly champagne sales (in 1000's) (p.273: Montgomery: Fore. & T.S.)"), class = "factor")

有人可以解释什么是结构(c以及它为何如此广泛地扭曲数据?

1 个答案:

答案 0 :(得分:0)

如果查看CSV文件,底部会有一个额外的行,其中的标签会导致您的数据以字符(因此是因子)而不是整数读入。在nrows中设置read.csv以阻止其包含该行。

df <- read.csv('monthly-champagne-sales-in-1000s.csv', nrows = 96)

# clean up names
names(df) <- c('month', 'sales')

# plot with something like
plot(ts(df$sales, frequency=12, start=c(1950,1)), ylab = 'sales')

plot of champagne sales