R表数据并创建其他列

时间:2016-10-10 03:23:35

标签: r multiple-columns

我的数据如下。

import java.util.Scanner;

public class InterestCalculator {
    public static void main(String[] args) {
        Scanner keyboard = new Scanner(System.in);
        int quartersDisplayed = -1;
        double b = -1.0;
        double IR = -1.0;

        do {
            do {
                System.out.println("Enter the number of quarters.");
                if(keyboard.hasNextInt()) {
                    quartersDisplayed = keyboard.nextInt();
                    keyboard.nextLine(); //important
                } else {
                    System.out.println("You need to enter an integer.");
                    continue;
                }
            } while(quartersDisplayed < 1 || quartersDisplayed > 10);

            do {
                System.out.println("Enter the starting balance.");
                if(keyboard.hasNextDouble()) {
                    b = keyboard.nextDouble();
                    keyboard.nextLine();
                } else {
                    System.out.println("You must enter a number.");
                    continue;
                }
            } while(b <= 0);

            do {
                System.out.println("Enter the interest rate.");
                if(keyboard.hasNextDouble()) {
                    IR = keyboard.nextDouble();
                    keyboard.nextLine();
                } else {
                    System.out.println("You must enter a number.");
                    continue;
                }
            } while(IR <= 0 || IR > 20.0);

            //... rest of code
        } while(true);
    }
}

我想创建一个列 - A=c(rep("x",3),rep("Y",2),rep("Z",3)) B=c(0,1,0,1,1,0,0,0) new=data.frame(A,B) ,如下所示

  1. 查看A列中唯一值的计数。示例 - X = 3,Y = 2,Z = 3
  2. 对于A列中的每个唯一值,找到列B的总和,例如当列A具有x时,列B和为1.当列A具有z,列B总和为0时
  3. 通过计算2除以计算1并找到%。 X = 1/3,Y = 2/2,Z = 0/3
  4. 创建一个具有以下值

    的新列

    •如果计算1高于100且计算3高于65%,则modified_zip将具有值modified_zip

    •如果计算1高于100且计算3低于35%,则65%above100将具有值modified_zip

    •如果计算1高于100且计算3介于35%至65之间,那么35%above100将具有值modified_zip

    •如果计算1介于50到100之间,而计算3介于65%之上,则otherabove100将具有值modified_zip

    •如果计算1介于50到100之间,而计算3低于35%,则65%between50and100将具有值modified_zip

    •如果计算1介于50到100之间,而计算3介于35%和65%之间,则35%between50and100将具有值modified_zip

    •如果计算1介于10到50之间,而计算3介于65%之上,则otherbetween50and100将具有值modified_zip

    •如果计算1介于10到50之间,而计算3低于35%,则65%between10and50将具有值modified_zip

    •如果计算1介于10到50之间,而计算3介于35%和65%之间,则35%between10and50将具有值modified_zip

    •如果计算1低于10,则otherbetween10and50将具有值modified_zip

  5. 我尝试使用smallnumber命令,但不知道如何处理%和计数,两者

2 个答案:

答案 0 :(得分:3)

另一种方法,使用`data.table&#39;

library( data.table )
setDT(new)

by参数会告诉data.table单独计算&#34; A&#34;

的每个唯一值
new[ , calc1 := sum( B ), by = A ]

.N是表格(或给定by集合)中行/观察数/ ...的预设值

new[ , calc2 := B / .N, by = A ]

现在添加所需的字符列,然后开始填充子集。这里有一个例子,因为正如@ Hack-R所说,一旦你知道如何做一个,你知道如何做到这一切:

new[ , modified_zip := as.character( NA ) ]
new[ calc1 > 100 & calc2 > 0.65, modified_zip := "65%above100" ]

答案 1 :(得分:1)

#Look at count of unique values in column 
library(sqldf)
sqldf("select A, count(A) from new group by A")

#for each unique value in column A, find sum of column B. 
sqldf("select A, count(A), sum(B) as sumB from new group by A")

# Divide calculation 1 by calculation 2
new1      <- sqldf("select A, count(A), sum(B) as sumB from new group by A")
new1$calc <- new1$`count(A)`/new1$sumB
new1$calc[is.infinite(new1$calc)] <- 0
new1$calc <- new1$calc*100

你有一个很长的规则列表,几乎没有一个适用于你的例子,因为你的最高计算1是3,但当你知道如何做1你知道怎么做所有这些,所以我会给你是一个例子:

#create a new column that will have below values
new1$modified_zip <- NA
new1$modified_zip[new1$`count(A)` > 100 & new1$calc > 65] <- "65%above100"