我有一个文本文件:
DATE 20090105
1 2.25 1.5
3 3.6 0.099
4 3.6 0.150
6 3.6 0.099
8 3.65 0.0499
DATE 20090105
DATE 20090106
1 2.4 1.40
2 3.0 0.5
5 3.3 0.19
7 2.75 0.5
10 2.75 0.25
DATE 20090106
DATE 20090107
2 3.0 0.5
2 3.3 0.19
9 2.75 0.5
DATE 20100107
我每天都有:
Time Rating Variance
我想在最大时间范围内计算特定时间的平均方差。
文件非常庞大,这只是一个很小的编辑样本。这意味着我不知道最新的时间和最早的时间(大约是2600),最近的时间可能是50000左右。
因此,例如在所有日子里,我在时间t = 1时只有1个值,因此这是当时的平均方差。
在时间t = 2时,在第一天,时间t = 2的方差取值1.5,因为它持续到t = 3,第二天取值= 0.5,第三天取值( (0.5 + 0.18)/ 2)。因此,在时间t = 2的所有日子中的平均方差是当时所有方差的总和除以那时的不同方差的数量。
对于当天的最后一次,所需的时间尺度为t = 1.
我只是想知道我怎么会这样做。
作为一个完整的初学者,我发现这很复杂。我是一名大学生,但大学已经完成,我正在努力学习Java,以帮助我在夏天帮助我的爸爸。因此,非常感谢任何有关解决方案的帮助。
答案 0 :(得分:0)
您必须按照以下步骤
EDIT 你已经编辑了你的问题,现在它看起来完全不同。 我认为在解析文件时需要帮助。如果我错了,请纠正我。
答案 1 :(得分:0)
如果我理解正确的话,那么你是在一个根据数据流计算的移动平均线之后。 我写的以下课程提供了一些这样的统计数据。
希望它有所帮助。
/**
* omry
* Jul 2, 2006
*
* Calculates:
* 1. running average
* 2. running standard deviation.
* 3. minimum
* 4. maximum
*/
public class Statistics
{
private double m_lastValue;
private double m_average = 0;
private double m_stdDevSqr = 0;
private int m_n = 0;
private double m_max = Double.NEGATIVE_INFINITY;
private double m_min = Double.POSITIVE_INFINITY;
private double m_total;
// decay factor.
private double m_d;
private double m_decayingAverage;
private double m_decayingStdDevSqr;
public Statistics()
{
this(2);
}
public Statistics(float d)
{
m_d = d;
}
public void addValue(double value)
{
m_lastValue = value;
m_total += value;
// see http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
m_n++;
double delta = value - m_average;
m_average = m_average + delta / (float)m_n;
double md = (1/m_d);
if (m_n == 1)
{
m_decayingAverage = value;
}
m_decayingAverage = (md * m_decayingAverage + (1-md)*value);
// This expression uses the new value of mean
m_stdDevSqr = m_stdDevSqr + delta*(value - m_average);
m_decayingStdDevSqr = m_decayingStdDevSqr + delta*(value - m_decayingAverage);
m_max = Math.max(m_max, value);
m_min = Math.min(m_min, value);
}
public double getAverage()
{
return round(m_average);
}
public double getDAverage()
{
return round(m_decayingAverage);
}
public double getMin()
{
return m_min;
}
public double getMax()
{
return m_max;
}
public double getVariance()
{
if (m_n > 1)
{
return round(Math.sqrt(m_stdDevSqr/(m_n - 1)));
}
else
{
return 0;
}
}
public double getDVariance()
{
if (m_n > 1)
{
return round(Math.sqrt(m_decayingStdDevSqr/(m_n - 1)));
}
else
{
return 0;
}
}
public int getN()
{
return m_n;
}
public double getLastValue()
{
return m_lastValue;
}
public void reset()
{
m_lastValue = 0;
m_average = 0;
m_stdDevSqr = 0;
m_n = 0;
m_max = Double.NEGATIVE_INFINITY;
m_min = Double.POSITIVE_INFINITY;
m_decayingAverage = 0;
m_decayingStdDevSqr = 0;
m_total = 0;
}
public double getTotal()
{
return round(m_total);
}
private double round(double d)
{
return Math.round((d * 100))/100.0;
}
}
答案 2 :(得分:0)
我想我明白了。你想要
t
的平均差异 - 由当天最高时间戳小于t
t
所以我建议,一旦你按照@Manjoor的建议解析数据,那么,(伪代码!)
function getAverageAt(int t)
float lastvariance = 0; // what value to start on,
// if no variance is specified at t=1 on day 1
// also acts as accumulator if several values at one
// timestamp
float allDaysTotal = 0; // cumulative sum of the variance at time t for all days
for each day {
float time[], rating[], variance[];
//read these from table
int found=0; //how many values found at time t today
for(int i=0;i<time.length;i++){
if(time[i]<t) lastvariance=variance[i]; // find the most recent value
// before t.
// This relies on your data being in order!
else if(time[i]==t){ // time
found++;
if (found==1) lastvariance=variance[i]; // no previous occurrences today
else lastvariance+=variance[i];
}
else if(time[i]>t) break;
}
if(found>1) lastvariance/=found; // calculate average of several simultaneous
// readings, if more than one value found today at time t.
// Note that: if found==0, this means you're using a previous
// timestamp's value.
// Also note that, if at t=1 you have 2 values of variance, that
// averaged value will not continue over to time t.
// You could easily reimplement that if that's the behaviour you desire,
// the code is similar, but putting the time<t condition along with the
// time==t condition
allDaysTotal+=lastvariance;
}
allDaysMean = allDaysTotal / nDays
你的问题不是一个简单的问题,正如我指出的问题所示。
答案 3 :(得分:0)
好的,我有一个有效的代码。但它需要很长时间(大约7个月的一天,每天有30,000个差异),因为它必须循环这么多次。还有其他更好的建议吗?
我的意思是这段代码,看似简单,需要大约24-28小时......
包VarPackage;
import java.io.BufferedReader; import java.io.FileReader; import java.util.ArrayList;
public class ReadText {
public static void main(String[] args) throws Exception {
String inputFileName="C:\\MFile";
ArrayList<String> fileLines = new ArrayList<String>();
FileReader fr;
BufferedReader br;
// Time
int t = 1;
fr = new FileReader(inputFileName);
br = new BufferedReader(fr);
String line;
while ((line=br.readLine())!=null) {
fileLines.add(line);
}
AvgVar myVar = new AvgVar(fileLines);
for(t=1; t<10; t++){
System.out.print("Average Var at Time t=" + t + " = " + myVar.avgVar(t)+"\n");
}
} }
===================================
包VarPackage;
import java.util.ArrayList;
public class AvgVar { //类变量 private ArrayList inputData = new ArrayList();
//构造函数 AvgVar(ArrayList fileData){ inputData = fileData; }
public double avgVar(int time){
double avgVar = 0;
ArrayList<double[]> avgData = avgDuplicateVars(inputData);
for(double[] arrVar : avgData){
avgVar += arrVar[time-1];
//System.out.print(arrVar[time-1] + "," + arrVar[time] + "," + arrVar[time+1] + "\n");
//System.out.print(avgVar + "\n");
}
avgVar /= numDays(inputData);
return avgVar;
}
private int numDays(ArrayList<String> varData){
int n = 0;
int flag = 0;
for(String line:varData){
String[] myData = line.split(" ");
if(myData[0].equals("DATE") && flag == 0){
flag = 1;
}
else if(myData[0].equals("DATE") && flag == 1){
n = n + 1;
flag = 0;
}
}
返回n;
}
private ArrayList<double[]> avgDuplicateVars(ArrayList<String> varData){
ArrayList<double[]> avgData = new ArrayList<double[]>();
double[] varValue = new double[86400];
double[] varCount = new double[86400];
int n = 0;
int flag = 0;
for(String iLine:varData){
String[] nLine = iLine.split(" ");
if(nLine[0].equals("DATE") && flag == 0){
for (int i=0; i<86400; i++){
varCount[i] = 0;
varValue[i] = 0;
}
flag = 1;
}
else if(nLine[0].equals("DATE") && flag == 1){
for (int i=0; i<86400; i++){
if (varCount[i] != 0){
varValue[i] /= varCount[i];
}
}
varValue = fillBlankSpreads(varValue, 86400);
avgData.add(varValue.clone());
flag = 0;
}
else{
n = Integer.parseInt(nLine[0])-1;
varValue[n] += Double.parseDouble(nLine[2]);
varCount[n] += 1;
}
}
return avgData;
}
private double[] fillBlankSpreads(double[] varValue, int numSpread){
//Filling the Data with zeros to make the code faster
for (int i=1; i<numSpread; i++){
if(varValue[i] == 0){
varValue[i] = varValue[i-1];
}
}
return varValue;
}
}