我是新来的。我刚开始学习R。
我有这个问题:
假设我有一个数据帧:
name = c("John", "John","John","John","Mark","Mark","Mark","Mark","Dave", "Dave","Dave","Dave")
color = c("red", "blue", "green", "yellow","red", "blue", "green", "yellow","red", "blue", "green", "yellow")
value = c( 1,2,1,3,5,5,3,2,4,6,7,8)
df = data.frame(name, color, value)
#View(df)
df
# name color value
# 1 John red 1
# 2 John blue 2
# 3 John green 1
# 4 John yellow 3
# 5 Mark red 5
# 6 Mark blue 5
# 7 Mark green 3
# 8 Mark yellow 2
# 9 Dave red 4
# 10 Dave blue 6
# 11 Dave green 7
# 12 Dave yellow 8
我希望它看起来像这样:
# names red blue green yellow
#1 John 1 2 1 3
#2 Mark 5 5 3 2
#3 Dave 4 6 7 8
也就是说,第一列(名称)中的条目将变为唯一,第二列(颜色)中的级别将是新列,而这些新列中的条目将来自相应的行中的原始数据框中的第三列(值)。
我可以使用以下方法完成此任务:
library(dplyr)
df = df %>%
group_by(name) %>%
mutate(red = ifelse(color == "red", value, 0.0),
blue = ifelse(color == "blue", value, 0.0),
green = ifelse(color == "green", value, 0.0),
yellow = ifelse(color == "yellow", value, 0.0)) %>%
group_by(name) %>%
summarise_each(funs(sum), red, blue, green, yellow)
df
name red blue green yellow
1 Dave 4 6 7 8
2 John 1 2 1 3
3 Mark 5 5 3 2
但如果颜色列中有很多级别,这将不太理想。我将如何继续这样做?
谢谢!
答案 0 :(得分:4)
由于OP正在使用dplyr
系列软件包,因此tidyr
library(tidyr)
spread(df, color, value)
# name blue green red yellow
#1 Dave 6 7 4 8
#2 John 2 1 1 3
#3 Mark 5 3 5 2
如果我们需要使用%>%
library(dplyr)
df %>%
spread(color, value)
为了保持订单,我们可以转换颜色' factor
将levels
类指定为&{39;颜色'的unique
值的spread
然后执行df %>%
mutate(color = factor(color, levels = unique(color))) %>%
spread(color, value)
# name red blue green yellow
#1 Dave 4 6 7 8
#2 John 1 2 1 3
#3 Mark 5 5 3 2
data.table
或者我们可以dcast
使用更快data.table
。转换为dcast
并使用data.table
中的dcast
具有优势。它比reshape2
中的library(data.table)
dcast(setDT(df), name~color, value.var="value")
# name blue green red yellow
#1: Dave 6 7 4 8
#2: John 2 1 1 3
#3: Mark 5 3 5 2
要快得多。
base R
注意:在这两个解决方案中,我们得到了预期输出中的列名,并且没有附加任何uglier后缀或前缀(可以更改BTW,但它是另一行代码)
如果我们需要tapply
,则有一个选项是with(df, tapply(value, list(name, color), FUN = I))
# blue green red yellow
#Dave 6 7 4 8
#John 2 1 1 3
#Mark 5 3 5 2
NSData
答案 1 :(得分:3)
那么你想要一个交叉表呢?
package com.alindal.calc;
import java.util.Scanner;
public class Calc {
public static void main(String[] args) {
System.out.println("Select an option : \n 1:Addition 2:Subtraction 3:Multiplication 4: Division");
// TODO Auto-generated method stub
Scanner read=new Scanner(System.in);
int x=read.nextInt();
switch(x)
{
case 1:
add();
break;
case 2:
sub();
break;
case 3:
multi();
break;
case 4:
div();
break;
default:
System.out.println("Invalid choice");
}
}
public static void add()
{
Scanner read=new Scanner(System.in);
System.out.println("Enter the values a and b");
int a=read.nextInt();
int b=read.nextInt();
int c=a+b;
System.out.println("The sum is "+c);
}
public static void sub()
{
System.out.println("Enter the values a and b");
Scanner read=new Scanner(System.in);
int a=read.nextInt();
int b=read.nextInt();
int c=a-b;
System.out.println("The difference is "+c);
}
public static void multi()
{
System.out.println("Enter the values a and b");
Scanner read=new Scanner(System.in);
int a=read.nextInt();
int b=read.nextInt();
int c=a*b;
System.out.println("The product is "+c);
}
public static void div()
{
System.out.println("Enter the values a and b");
Scanner read=new Scanner(System.in);
int a=read.nextInt();
int b=read.nextInt();
int c=a/b;
System.out.println("The division is "+c);
}
}
答案 2 :(得分:3)
您可以使用dcast
包
reshape2
library(reshape2)
dcast(df, name~color)
# name blue green red yellow
#1 Dave 6 7 4 8
#2 John 2 1 1 3
#3 Mark 5 3 5 2
或者您可以reshape
base R
reshape(df, idvar="name", timevar="color", direction="wide")
# name value.red value.blue value.green value.yellow
#1 John 1 2 1 3
#5 Mark 5 5 3 2
#9 Dave 4 6 7 8