将行条目转换为R中的列

时间:2016-08-03 04:30:34

标签: r dplyr

我是新来的。我刚开始学习R。

我有这个问题:

假设我有一个数据帧:

name = c("John", "John","John","John","Mark","Mark","Mark","Mark","Dave", "Dave","Dave","Dave")
color = c("red", "blue", "green", "yellow","red", "blue", "green", "yellow","red", "blue", "green", "yellow") 
value = c( 1,2,1,3,5,5,3,2,4,6,7,8)
df = data.frame(name, color, value)
#View(df)
df
#     name  color value
#  1  John    red     1
#  2  John   blue     2
#  3  John  green     1
#  4  John yellow     3
#  5  Mark    red     5
#  6  Mark   blue     5
#  7  Mark  green     3
#  8  Mark yellow     2
#  9  Dave    red     4
#  10 Dave   blue     6
#  11 Dave  green     7
#  12 Dave yellow     8

我希望它看起来像这样:

#   names red blue green yellow
#1   John   1    2     1      3
#2   Mark   5    5     3      2
#3   Dave   4    6     7      8

也就是说,第一列(名称)中的条目将变为唯一,第二列(颜色)中的级别将是新列,而这些新列中的条目将来自相应的行中的原始数据框中的第三列(值)。

我可以使用以下方法完成此任务:

library(dplyr)
  df = df %>%
  group_by(name) %>%
  mutate(red = ifelse(color == "red", value, 0.0),
         blue = ifelse(color == "blue", value, 0.0),
         green = ifelse(color == "green", value, 0.0),
         yellow = ifelse(color == "yellow", value, 0.0)) %>%
  group_by(name) %>%
  summarise_each(funs(sum), red, blue, green, yellow)
df
    name   red  blue green yellow
1   Dave     4     6     7      8
2   John     1     2     1      3
3   Mark     5     5     3      2

但如果颜色列中有很多级别,这将不太理想。我将如何继续这样做?

谢谢!

3 个答案:

答案 0 :(得分:4)

由于OP正在使用dplyr系列软件包,因此tidyr

是个不错的选择
library(tidyr)
spread(df, color, value)
#    name blue green red yellow
#1 Dave    6     7   4      8
#2 John    2     1   1      3
#3 Mark    5     3   5      2

如果我们需要使用%>%

library(dplyr)
df %>% 
    spread(color, value)

为了保持订单,我们可以转换颜色' factorlevels类指定为&{39;颜色'的unique值的spread然后执行df %>% mutate(color = factor(color, levels = unique(color))) %>% spread(color, value) # name red blue green yellow #1 Dave 4 6 7 8 #2 John 1 2 1 3 #3 Mark 5 5 3 2

data.table

或者我们可以dcast使用更快data.table。转换为dcast并使用data.table中的dcast具有优势。它比reshape2中的library(data.table) dcast(setDT(df), name~color, value.var="value") # name blue green red yellow #1: Dave 6 7 4 8 #2: John 2 1 1 3 #3: Mark 5 3 5 2 要快得多。

base R

注意:在这两个解决方案中,我们得到了预期输出中的列名,并且没有附加任何uglier后缀或前缀(可以更改BTW,但它是另一行代码)

如果我们需要tapply,则有一个选项是with(df, tapply(value, list(name, color), FUN = I)) # blue green red yellow #Dave 6 7 4 8 #John 2 1 1 3 #Mark 5 3 5 2

NSData

答案 1 :(得分:3)

那么你想要一个交叉表呢?

 package com.alindal.calc;
import java.util.Scanner;

public class Calc {

    public static void main(String[] args) {
        System.out.println("Select an option : \n 1:Addition 2:Subtraction 3:Multiplication 4: Division");
        // TODO Auto-generated method stub
Scanner read=new Scanner(System.in);
int x=read.nextInt();
    switch(x)
    {
    case 1:
        add();
        break;
    case 2:
        sub();
        break;
    case 3:
        multi();
        break;
    case 4:
        div();
        break;
        default:
            System.out.println("Invalid choice");
    }

    }
    public static void add()
    {
        Scanner read=new Scanner(System.in);
        System.out.println("Enter the values a and b");
        int a=read.nextInt();
        int b=read.nextInt();
    int c=a+b;
    System.out.println("The sum is "+c);
    }
    public static void sub()
    {
        System.out.println("Enter the values a and b");
        Scanner read=new Scanner(System.in);
        int a=read.nextInt();
        int b=read.nextInt();
    int c=a-b;
    System.out.println("The difference is "+c);
    }
    public static void multi()
    {
        System.out.println("Enter the values a and b");
        Scanner read=new Scanner(System.in);
        int a=read.nextInt();
        int b=read.nextInt();
    int c=a*b;
    System.out.println("The product is "+c);
    }
    public static void div()
    {
        System.out.println("Enter the values a and b");
        Scanner read=new Scanner(System.in);
        int a=read.nextInt();
        int b=read.nextInt();
    int c=a/b;
    System.out.println("The division is "+c);
    }

}

答案 2 :(得分:3)

您可以使用dcast

中的reshape2
library(reshape2)
dcast(df, name~color)


#  name blue green red yellow
#1 Dave    6     7   4      8
#2 John    2     1   1      3
#3 Mark    5     3   5      2

或者您可以reshape

base R
reshape(df, idvar="name", timevar="color", direction="wide")


#  name value.red value.blue value.green value.yellow
#1 John         1          2           1            3
#5 Mark         5          5           3            2
#9 Dave         4          6           7            8