如何根据其他列的标准将一列分为两列

时间:2017-05-31 18:27:30

标签: r dplyr data-cleaning

我有一个这样的数据框:

Category <-c("Agriculture","Education","Education","Energy","Environment","Finance","Governance","Governance","Economics","Economics","Equality","Society" , "Protection","Trade","Trade","Trade", "Transport","Transport","Water")
Value <- c(0.00e+00, 8.75e+08, 0.00e+00, 0.00e+00, 0.00e+00, 0.00e+00, 8.30e+08, 0.00e+00, 5.00e+08, 0.00e+00, 0.00e+00, 3.50e+08, 0.00e+00, 2.20e+08, 3.00e+08, 0.00e+00, 5.06e+08,0.00e+00, 3.50e+08)
Prod_A <- c(NA, NA, NA, NA, NA, NA, NA, NA, 1, NA, NA, NA, NA, NA, 2, NA, NA, NA, NA)
Prod_B <- c(NA, 3, NA, NA, NA, NA, 2, NA, NA, NA, NA, 1, NA, 3, NA, NA, 2, NA, 1)
df <- data.frame(Category, Value, Prod_A, Prod_B) 
df
 Category    Value     Prod_A    Prod_B
1  Agriculture 0.00e+00   NA     NA
2  Education 8.75e+08     NA      3
3  Education 0.00e+00     NA     NA
4  Energy 0.00e+00        NA     NA
5  Environment 0.00e+00   NA     NA
6  Finance 0.00e+00       NA     NA
7  Governance 8.30e+08    NA      2
8  Governance 0.00e+00    NA     NA
9  Economics 5.00e+08      1     NA
10 Economics 0.00e+00     NA     NA
11 Equality 0.00e+00      NA     NA
12 Society 3.50e+08       NA      1
13 Protection 0.00e+00    NA     NA
14 Trade 2.20e+08         NA      3
15 Trade 3.00e+08          2     NA
16 Trade 0.00e+00         NA     NA
17 Transport 5.06e+08     NA      2
18 Transport 0.00e+00     NA     NA
19 Water 3.50e+08         NA      1

专栏&#39;价值&#39;是产品A或产品B的sum_value。

&#39; PROD_A&#39;和&#39; Prod_B&#39;是产品的数量。

我想要做的是将产品B的值与列&#39;值&#39;分开。并使其成为一个新列,以便这两个产品的sum_value不会在一列中混合在一起。我怎样才能做到这一点?

我正在使用spread(df,Value,Prod_B),但显然是错误的...
任何帮助将不胜感激!谢谢!

1 个答案:

答案 0 :(得分:0)

使用data.table,这应该做的工作:

# convert df to data.table if it is necesary:
library(data.table)
df <- data.table(df)

head(df)

      Category    Value Prod_A Prod_B
1: Agriculture 0.00e+00     NA     NA
2:   Education 8.75e+08     NA      3
3:  Governance 8.30e+08     NA      2
4:   Economics 5.00e+08      1     NA
5:       Trade 2.20e+08     NA      3
6:       Trade 3.00e+08      2     NA
# generate value_A and value_B as needed:

df <- df[is.na(Prod_A) & !is.na(Prod_B), value_B := Value,][is.na(Prod_B) & !is.na(Prod_A), value_A:=Value]

head(df)

       Category    Value Prod_A Prod_B  value_B value_A
1: Agriculture 0.00e+00     NA     NA       NA      NA
2:   Education 8.75e+08     NA      3 8.75e+08      NA
3:  Governance 8.30e+08     NA      2 8.30e+08      NA
4:   Economics 5.00e+08      1     NA       NA   5e+08
5:       Trade 2.20e+08     NA      3 2.20e+08      NA
6:       Trade 3.00e+08      2     NA       NA   3e+08

请注意,当Prod_A和Prod_B为'NA'时,脚本将value_A和value_B设为'NA'。