如何使用R来处理数据

时间:2014-07-29 19:17:06

标签: sql r join dplyr sqldf

我正在尝试进行预测分析,其中x年的指标将预测第x + 1年。

我想使用R在SQL中执行相当于UPDATE的查询。如果我有这样的数据:

x <- c("Randy Watson", "Cleo McDowell", "Darryl Jenks", "Jaffe Joffer",
       "Randy Watson", "Cleo McDowell", "Darryl Jenks", "Jaffe Joffer",
       "Randy Watson", "Cleo McDowell", "Darryl Jenks", "Jaffe Joffer")
y <- c("2012", "2012", "2012", "2012", 
       "2013", "2013", "2013", "2013", 
       "2014", "2014", "2014", "2014")
z <- c(100, 50, 75, 0, 
       110, 75, 0, 25, 
       125, 25, 10, 50)

df <- data.frame(x, y, z)

colnames(df) <- c("Name", "Year", "Sales")

print(df)

            Name Year Sales
1   Randy Watson 2012   100
2  Cleo McDowell 2012    50
3   Darryl Jenks 2012    75
4   Jaffe Joffer 2012     0
5   Randy Watson 2013   110
6  Cleo McDowell 2013    75
7   Darryl Jenks 2013     0
8   Jaffe Joffer 2013    25
9   Randy Watson 2014   125
10 Cleo McDowell 2014    25
11  Darryl Jenks 2014    10
12  Jaffe Joffer 2014    50

我希望最终输出看起来像:

print(df)
           Name YearX YearX1
1  Randy Watson   100   110
2 Cleo McDowell    50    75
3  Darryl Jenks    75     0
4  Jaffe Joffer     0    25

...

我怎样才能在R中这样做?我知道如何在SQL中执行此操作(尽管我不想使用sqldf,除非它是最好的方法)。

感谢。

修改 以下解决方案并不是我想要的。如果只有两年我的数据有10年就有用了。我不需要姓名,第1年,第2年,第3年等......我只想要姓名,年份X,年份+ 1。对不起,如果我不清楚这一点。

1 个答案:

答案 0 :(得分:2)

喜欢这个

# I took the liberty of rearranging your working example a bit   
df <- data.frame(
           Name = c("Randy Watson", "Cleo McDowell", "Darryl Jenks", "Jaffe Joffer",
                    "Randy Watson", "Cleo McDowell", "Darryl Jenks", "Jaffe Joffer"),
           Year = c("2013", "2013", "2013", "2013", "2014", "2014", "2014", "2014"),
           Sales = c(100, 50, 75, 0, 110, 75, 0, 25))

reshape(df, idvar = "Name", timevar = "Year", direction = "wide")

           Name Sales.2013 Sales.2014
1  Randy Watson        100        110
2 Cleo McDowell         50         75
3  Darryl Jenks         75          0
4  Jaffe Joffer          0         25

或密切关注您的问题

df_wide <- reshape(df, idvar = "Name", timevar = "Year", direction = "wide")

colnames(df_wide) <- c("Name", "Year0", "Year1") 

print(df_wide)
           Name Year0 Year1
1  Randy Watson   100   110
2 Cleo McDowell    50    75
3  Darryl Jenks    75     0
4  Jaffe Joffer     0    25

一些可以给你相同结果的替代方法

library(reshape)
cast(df, Name ~ Year)

xtabs(Sales  ~  Name + Year, data = df)