对R中列组合的多个操作

时间:2016-02-03 12:07:47

标签: r vector combinations series

使用以下命令我使用命名系列构建了一个示例数据框,然后我创建了另一个包含所有可能的列名对的框架。

HttpURLConnection httpURLConnection = null;
            String jsonResponse = null;


        try{
            final String path = "http://10.0.2.2:8889/insert-db.php";
            //final String path = "http://192.168.0.11:8889/insert-db.php";
            URL finalURl = new URL(path);

            httpURLConnection = (HttpURLConnection) finalURl.openConnection();
            httpURLConnection.setDoOutput(true);
            httpURLConnection.setRequestMethod("POST");

            OutputStream os = httpURLConnection.getOutputStream();
            BufferedWriter writer = new BufferedWriter(
                    new OutputStreamWriter(os, "UTF-8")
            );
            String data = URLEncoder.encode("email", "UTF-8") + "=" + URLEncoder.encode(mEmail, "UTF-8") +
                    "&" + URLEncoder.encode("password", "UTF-8") + "=" + URLEncoder.encode(mPassword, "UTF-8");


            writer.write(data);
            Log.d("OUTPUT INFO STREAM", httpURLConnection.getOutputStream().toString());
            writer.flush();

            writer.close();
            os.close();

            InputStream is = httpURLConnection.getInputStream();
            BufferedReader reader = new BufferedReader(new InputStreamReader(is, "UTF-8"));
            Log.d("INPUTSTREAM", reader.readLine());
            is.close();


        }  catch (IOException e) {
            e.printStackTrace();
            return false;
        }
        httpURLConnection.disconnect();

        return true;
    }
他们看起来像这样:

dataset <- data.frame(randwalk(10), randwalk(10), randwalk(10), randwalk(10), randwalk(10))
colnames(dataset) <- c( "one", "two", "three", "four", "five")
datasetpairs = data.frame(t(combn(colnames(dataset), 2)))
colnames(datasetpairs) <- c("numerator", "denominator")

我想要做的是在“datasetpairs”中添加几列,以存储每个列对的平均值,最大值和最小值。 我可以通过管道每行的值来获得一个数字,所以我可以做一个FOR循环,但我试图做它的矢量样式:

head(dataset)
        one       two    three     four     five
1 1.0000000 1.0000000 1.000000 1.000000 1.000000
2 1.0055678 0.9866026 1.004089 1.007859 1.004886
3 1.0137884 0.9794308 1.013057 1.011453 1.003129
4 1.0043928 0.9838919 1.026479 1.025951 1.005845
5 0.9942291 0.9839125 1.026769 1.030824 1.007177
6 0.9993814 0.9618307 1.035784 1.037156 1.026317
head(datasetpairs)
  numerator denominator
1       one         two
2       one       three
3       one        four
4       one        five
5       two       three
6       two        four

但这给了我一个错误。 此外,我真正想要做的只是从两列计算一次比率,并在分析之前存储几个值而不存储它,因为实际上我的数据集太大而无法计算所有可能的组合比率预先。在不诉诸循环的情况下,这样做的优雅方法是什么? 感谢任何可以提供帮助的人!

1 个答案:

答案 0 :(得分:1)

以下是使用data.table的解决方案(因为它可以快速执行许多分组操作)和自定义函数来进行分析。这样,您的代码是可读的,您只需在继续之前计算每个比率一次。

library(data.table)

#create data
set.seed(123)
dataset <- data.frame(matrix(runif(50),ncol=5))
colnames(dataset) <- c( "one", "two", "three", "four", "five")

#custom function to process two vectors:
process_data <- function(v1,v2){
  ratio <- v1/v2
  res <- list(mean=mean(ratio),min=min(ratio),max=max(ratio))
  return(res)
}

datasetpairs = data.table(t(combn(colnames(dataset), 2)))
colnames(datasetpairs) <- c("numerator", "denominator")

#run the analysis
datasetpairs[,process_data(dataset[[numerator]],dataset[[denominator]]),by=.(numerator,denominator)]