我试图找到standard deviation(σ=√[(Σ(x - MEAN))2÷n])csv file.csv文件的单个提取列包含大约45000个实例和17个属性。 ';'。 为了找到标准偏差,在与Xi一起使用的while循环的每次迭代中都需要MEAN值。所以我认为MEAN需要在循环迭代之前找到标准偏差。但我不知道该怎么做或者有没有办法做到这一点。我被困在这里。然后我用新的Xi替换了旧Xi的代码。然后编写(生成)新的csv文件。
import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;
import java.util.logging.Level;
import java.util.logging.Logger;
import java.io.FileWriter;
import java.io.*;
import static java.lang.Math.sqrt;
public class Main {
public static void main(String[] args) throws IOException {
String filename = "ly.csv";
File file = new File(filename);
BufferedWriter writer = null;
try {
writer = new BufferedWriter(new FileWriter("bank-full_updated.csv"));
}
catch (IOException e) {
}
try {
double Tuple,avg;
double temp;
Tuple = 0;
double stddev=0;
Scanner inputStream = new Scanner(file);
inputStream.next();
while (inputStream.hasNext()) {
String data1 = inputStream.next();
String[] values = data1.split(";");
double Xi = Double.parseDouble(values[1]);
//now finding standard deviation
temp1 += (Xi-MEAN);
// temp2=(temp1*temp1);
// temp3=(temp2/count);
// standard deviation=Math.sqrt(temp3);
Xi=standard deviation * Xi
//now replace new Xi to original values1
values[1] = String.valueOf(Xi);
// iterate through the values and build a string out of them for write a new file
StringBuilder sb = new StringBuilder();
String newData = sb.toString();
for (int i = 0; i < values.length; i++) {
sb.append(values[i]);
if (i < values.length - 1) {
sb.append(";");
}
}
// get the new string
System.out.println(sb.toString());
writer.write(sb.toString()+"\n");
}
writer.close();
inputStream.close();
}
catch (FileNotFoundException ex) {
Logger.getLogger(Main.class.getName()).log(Level.SEVERE, null, ex);
}
}
}
答案 0 :(得分:2)
可以一次性计算标准偏差。 Donald Knuth教授有一个使用Kahan求和算法的算法。以下是论文:http://researcher.ibm.com/files/us-ytian/stability.pdf
Here是另一种方式,但它有四舍五入的错误:
double std_dev2(double a[], int n) {
if(n == 0)
return 0.0;
double sum = 0;
double sq_sum = 0;
for(int i = 0; i < n; ++i) {
sum += a[i];
sq_sum += a[i] * a[i];
}
double mean = sum / n;
double variance = sq_sum / n - mean * mean;
return sqrt(variance);
}