我有两个具有历史市场数据(资产价格)的数据集,我想计算皮尔逊相关性,但我不希望它在计算中使用整个数据集。计算中使用的周期数应为变量。
<div class="row ml-0 w-100" id="input-file">
<div class="col-12">
<div class="form-group">
<label for="" class="control-label w-100">Select Files to Upload</label>
<input type="file" name="files" id="files" class="hidden" multiple>
<button class="btn btn-info" id="display-fs">Select Files</button>
<button class="btn btn-primary" id="do-file-work">Process Files</button>
</div>
</div>
</div>
<div class="row ml-0 w-100" id="results"></div>
<script>
// https://github.com/catamphetamine/read-excel-file
// built on node, but implemented in a standalone way
// the below is my current trouble code... It fails due to the values rows, and
// errors not being defined yet, but when i attempted the looper functions it
// just ignored the whole then completely...
var fileList = [], fileInput = $('input#files')[0], localAssetsList = [];
$('#display-fs').on('click', function () {
if ($('input#files')) {
$('input#files').click();
}
});
$(fileInput).on('change', function (event) {
fileList = [];
for (var i = 0; i < fileInput.files.length; i++) {
fileList.push(fileInput.files[i]);
}
});
$('#do-file-work').on('click', function(e){
var data = null;
if (fileList.length == 0) return;
for (var i = 0; i < fileList.length; i++) {
var file = fileList[i], extention = file.name.substring(file.name.indexOf('.') + 1, file.name.length), fileName = file.name.substring(0, file.name.indexOf('.'));
if (extention == 'xls' || extention == 'xlsx') {
debugger;
if (fileName.toLowerCase().indexOf('asset') != -1) {
readXlsxFile(file).then((function(rows, errors, fileName) {
debugger;
var data1 = rows;
if (data1.length > 0) {
$('<div></div>', {
id: 'assets-' + i,
class: 'col-12 table-responsive',
html: $('<table></table>', {
id: fileName,
class: 'table table-sm table-striped'
})
}).appendTo('#results');
$('<thead></thead>', {
html: $('<tr></tr>', {
id: fileName + '-header',
class: 'text-nowrap'
})
}).appendTo('#' + fileName);
for (var k = 0; k < data1[0].length; k++) {
var element = data1[0][k];
$('<th></th>', {
html: element.toString().toUpperCaseFirstWord()
}).appendTo('#' + fileName + '-header');
}
$('<tbody></tbody>', {
id: fileName + '-body'
}).appendTo('#' + fileName);
for (var x = 1; x < data1.length; x++) {
var ele = data1[x];
if (ele[0] === null){
console.error('Row ' + (x + 1) + ' is missing critical detail in cell 1. Skipping row.');
continue;
} else {
for (var n = 0; n < ele.length; n++) {
var y = ele[n];
if (n == 0) {
$('<tr></tr>', {
id: 'row-' + y,
html: $('<td></td>', {
html: (isNull(y)) ? "": y.toString()
}),
on: {
click: function (e) {
if ($(this).hasClass('selected-row')) {
$(this).removeClass('selected-row');
} else {
$(this).addClass('selected-row');
}
}
}
}).appendTo('#' + fileName + '-body')
} else {
$('<td></td>', {
html: (isNull(y)) ? "": y.toString()
}).appendTo('#row-' + ele[0]);
}
}
}
}
}
})(rows, errors, fileName)); //this closure doesnt work...
}
}
}
})
</script>
因此,我添加了两列每日差异,用于计算从一行到另一行的百分比变化。
Index Time Asset1 Asset2
0 1479686400 738.99 738.0
1 1479772800 749.85 749.7
2 1479859200 742.00 741.9
3 1479945600 737.61 738.4
4 1480032000 740.36 742.1
...
750 1544486400 3435.28 3348.1
751 1544572800 3535.60 3435.2
752 1544659200 3354.30 3260.0
753 1544745600 3281.70 3194.9
754 1544832000 3283.40 3180.1
知道了:
df['A1 Var'] = df['Asset1'].pct_change()
df['A2 Var'] = df['Asset2'].pct_change()
在这里我已经有两个问题:
1-为什么为什么第一行(索引0)“ A1 Var”的值是“ A2 Var”是NaN的值?
(对我来说,他们都应该是NaN,因为以前没有数据来计算变化百分比) 0.012773的来源是什么?
2-如何计算Index Time Asset1 Asset2 A1 Var A2 Var
0 1479686400 738.99 738.0 0.012773 NaN
1 1479772800 749.85 749.7 0.014696 0.117872
2 1479859200 742.00 741.9 -0.010469 0.136832
3 1479945600 737.61 738.4 -0.005916 0.139311
4 1480032000 740.36 742.1 0.003728 -0.013370
...
750 1544486400 3435.28 3348.1 -0.024899 0.009228
751 1544572800 3535.60 3435.2 0.029202 0.056866
752 1544659200 3354.30 3260.0 -0.051278 -0.011901
753 1544745600 3281.70 3194.9 -0.021644 0.007987
754 1544832000 3283.40 3180.1 0.000518 -0.019618
?我问是因为pct_change()
上的数字没有加起来。
- df ['A1 Var']索引0到1:(我忽略了0.012773 )
738.99 + 1.4696%= 749.85(确定)
- 现在df ['A2 Var']相同的索引:
738 + 11.7872%(LOL)= 824,99(预期749,7)
即使偏移量为零。 738 + 1.17872 = 746.7(预计749,7)
如果有更好的方法来逐行计算百分比变化,那也解决了我的问题。
和主要问题:
我知道我可以使用df['A2 Var']
来获得皮尔逊相关性。但是我想要一个新列,该列将为每行存储过去30行的相关性。 (并且30应该是一个变量btw)