我构建了一个具有多个维度和组的交叉过滤器,以使用dc.js直观地显示数据。可视化数据是自行车旅行数据,每次旅行都将加载。目前,有超过750,000条数据。我正在使用的JSON文件大70 MB,只需要在未来几个月收到更多数据时增长。
所以我的问题是,如何让数据更精益,以便它可以很好地扩展?现在加载我的互联网连接需要大约15秒钟,但是我担心一旦我有太多的数据需要太长时间。此外,我尝试(不成功)在数据加载时显示进度条/微调器,但我没有成功。
我需要的数据列是start_date, start_time, usertype, gender, tripduration, meters, age
。我已将JSON中的这些字段缩短为start_date, start_time, u, g, dur, m, age
,因此文件较小。在横向过滤器上,顶部有一个折线图,显示每天的总行程数。下面是星期几的行图(根据数据计算),月份(也计算得出)以及用户类型,性别和年龄的饼图。下面是start_time(向下舍入到小时)和tripduration(向上舍入到分钟)的两个条形图。
该项目位于GitHub上:https://github.com/shaunjacobsen/divvy_explorer(数据集位于data2.json中)。我尝试创建一个jsfiddle,但它不起作用(可能是由于数据,甚至只收集了1,000行并将其加载到带有<pre>
标记的HTML中):http://jsfiddle.net/QLCS2/
理想情况下,它的功能是只有顶部图表的数据才能首先加载:这会加载很快,因为它只是白天的数据计数。然而,一旦它进入其他图表,它需要逐步更多的数据来深入细节。关于如何使其发挥作用的任何想法?
答案 0 :(得分:8)
我建议将JSON中的所有字段名称缩短为1个字符(包括“start_date”和“start_time”)。这应该有点帮助。此外,请确保在服务器上打开压缩。这样,发送到浏览器的数据将在传输过程中自动压缩,如果尚未开启,则可以加快速度。
为了获得更好的响应能力,我还建议您首先设置Crossfilter(空),所有维度和组以及所有dc.js图表,然后使用Crossfilter.add()将更多数据添加到Crossfilter中块。最简单的方法是将数据划分为一口大小的块(每个几MB)并连续加载它们。因此,如果您使用的是d3.json,则在前一个文件加载的回调中启动下一个文件加载。这导致了一堆嵌套的回调,这有点令人讨厌,但应该允许用户界面在加载数据时做出响应。
最后,有了这么多数据,我相信你会在浏览器中遇到性能问题,而不仅仅是在加载数据时。我怀疑你已经看到了这个,你看到的15秒暂停至少部分在浏览器中。您可以通过浏览器的开发人员工具进行分析来检查。要解决此问题,您需要分析和识别性能瓶颈,然后尝试优化这些瓶颈。此外 - 如果他们在您的观众中,请务必在速度较慢的计算机上进行测试。
答案 1 :(得分:2)
考虑我的班级设计。它并不匹配你的,但它说明了我的观点。
public class MyDataModel
{
public List<MyDatum> Data { get; set; }
}
public class MyDatum
{
public long StartDate { get; set; }
public long EndDate { get; set; }
public int Duration { get; set; }
public string Title { get; set; }
}
开始日期和结束日期是Unix时间戳,持续时间以秒为单位。
序列化为:
&#34; {&#34;数据&#34 ;:
[{&#34;起始日期&#34;:1441256019,&#34;结束日期&#34;:1441257181,
&#34;持续时间&#34;:451,&#34;标题&#34;:&#34; Rad是一个很酷的词。&#34;},...]}&#34;
一行数据是92个字符。
让我们开始压缩! 将日期和时间转换为60个字符串。 将所有内容存储在一个字符串数组的数组中。
public class MyDataModel
{
public List<List<string>> Data { get; set; }
}
序列化为: &#34; {&#34;数据&#34;:[[&#34; 1pCSrd&#34;,&#34; 1pCTD1&#34;,&#34; 7V&#34;,&#34; Rad是一个酷词。&#34;],...]}&#34;
一行数据现在是47个字符。 moment.js是一个处理日期和时间的好图书馆。它具有内置的功能,可以解压缩60格式。
使用数组数组会降低代码的可读性,因此请添加注释以记录代码。
仅加载最近90天。缩放至30天。当用户在范围图表上拖动画笔时,左开始以90天的块为单位提取更多数据,直到用户停止拖动。使用add方法将数据添加到现有的crossfilter。
随着您添加越来越多的数据,您会发现您的图表响应越来越少。那是因为你在svg中渲染了数百甚至数千个元素。浏览器正在被粉碎。使用d3量化功能将数据点分组到存储桶中。将显示的数据减少到50个桶。
量化是值得的,也是您可以使用不断增长的数据集创建可扩展图表的唯一方法。
您的另一个选择是放弃范围图表并将数据月份,日期和小时数分组。然后添加日期范围选择器。由于您的数据按月,日和小时分组,您会发现即使您每天每小时骑自行车,您的结果集也不会超过8766行。
答案 2 :(得分:1)
我观察到类似的数据问题(在企业公司工作),我发现了一些值得尝试的想法。
计时器样本
var datacnt=0;
var timerId=setInterval(function () {
// body...
d3.select("#count-data-current").text(datacnt);
//update visualization should go here, something like dc.redrawAll()...
},300);
oboe("relative-or-absolute path to your data(ajax)")
.node('CNT',function (count,path) {
// body...
d3.select("#count-data-all").text("Expecting " + count + " records");
return oboe.drop;
})
.node('data.*', function (record, path) {
// body...
datacnt++;
return oboe.drop;
})
.node('done', function (item, path) {
// body...
d3.select("#progress-data").text("all data loaded");
clearTimeout(timerId);
d3.select("#count-data-current").text(datacnt);
});
数据样本
{"CNT":107498,
"keys": "DATACENTER","FQDN","VALUE","CONSISTENCY_RESULT","FIRST_REC_DATE","LAST_REC_DATE","ACTIVE","OBJECT_ID","OBJECT_TYPE","CONSISTENCY_MESSAGE","ID_PARAMETER"],
"data": [[22,202,"4.9.416.2",0,1449655898,1453867824,-1,"","",0,45],[22,570,"4.9.416.2",0,1449655912,1453867884,-1,"","",0,45],[14,377,"2.102.453.0",-1,1449654863,1468208273,-1,"","",0,45],[14,406,"2.102.453.0",-1,1449654943,1468208477,-1,"","",0,45],[22,202,"10.2.293.0",0,1449655898,1453867824,-1,"","",0,8],[22,381,"10.2.293.0",0,1449655906,1453867875,-1,"","",0,8],[22,570,"10.2.293.0",0,1449655912,1453867884,-1,"","",0,8],[22,381,"1.80",0,1449655906,1453867875,-1,"","",0,41],[22,570,"1.80",0,1449655912,1453867885,-1,"","",0,41],[22,202,"4",0,1449655898,1453867824,-1,"","",0,60],[22,381,"4",0,1449655906,1453867875,-1,"","",0,60],[22,570,"4",0,1449655913,1453867885,-1,"","",0,60],[22,202,"A20",0,1449655898,1453867824,-1,"","",0,52],[22,381,"A20",0,1449655906,1453867875,-1,"","",0,52],[22,570,"A20",0,1449655912,1453867884,-1,"","",0,52],[22,202,"20140201",2,1449655898,1453867824,-1,"","",0,40],[22,381,"20140201",2,1449655906,1453867875,-1,"","",0,40],[22,570,"20140201",2,1449655912,1453867884,-1,"","",0,40],[22,202,"16",-4,1449655898,1453867824,-1,"","",0,58],[22,381,"16",-4,1449655906,1453867875,-1,"","",0,58],[22,570,"16",-4,1449655913,1453867885,-1,"","",0,58],[22,202,"512",0,1449655898,1453867824,-1,"","",0,57],[22,381,"512",0,1449655906,1453867875,-1,"","",0,57],[22,570,"512",0,1449655913,1453867885,-1,"","",0,57],[22,930,"I32",0,1449656143,1461122271,-1,"","",0,66],[22,930,"20140803",-4,1449656143,1461122271,-1,"","",0,64],[14,1359,"10.2.340.19",0,1449655203,1468209257,-1,"","",0,131],[14,567,"10.2.340.19",0,1449655185,1468209111,-1,"","",0,131],[22,930,"4.9.416.0",-1,1449656143,1461122271,-1,"","",0,131],[14,1359,"10.2.293.0",0,1449655203,1468209258,-1,"","",0,13],[14,567,"10.2.293.0",0,1449655185,1468209112,-1,"","",0,13],[22,930,"4.9.288.0",-1,1449656143,1461122271,-1,"","",0,13],[22,930,"4",0,1449656143,1461122271,-1,"","",0,76],[22,930,"96",0,1449656143,1461122271,-1,"","",0,77],[22,930,"4",0,1449656143,1461122271,-1,"","",0,74],[22,930,"VMware ESXi 5.1.0 build-2323236",0,1449656143,1461122271,-1,"","",0,17],[21,616,"A20",0,1449073850,1449073850,-1,"","",0,135],[21,616,"4",0,1449073850,1449073850,-1,"","",0,139],[21,616,"12",0,1449073850,1449073850,-1,"","",0,138],[21,616,"4",0,1449073850,1449073850,-1,"","",0,140],[21,616,"2",0,1449073850,1449073850,-1,"","",0,136],[21,616,"512",0,1449073850,1449073850,-1,"","",0,141],[21,616,"Microsoft Windows Server 2012 R2 Datacenter",0,1449073850,1449073850,-1,"","",0,109],[21,616,"4.4.5.100",0,1449073850,1449073850,-1,"","",0,97],[21,616,"3.2.7895.0",-1,1449073850,1449073850,-1,"","",0,56],[9,2029,"10.7.220.6",-4,1470362743,1478315637,1,"vmnic0","",1,8],[9,1918,"10.7.220.6",-4,1470362728,1478315616,1,"vmnic3","",1,8],[9,1918,"10.7.220.6",-4,1470362727,1478315616,1,"vmnic2","",1,8],[9,1918,"10.7.220.6",-4,1470362727,1478315615,1,"vmnic1","",1,8],[9,1918,"10.7.220.6",-4,1470362727,1478315615,1,"vmnic0","",1,8],[14,205,"934.5.45.0-1vmw",-50,1465996556,1468209226,-1,"","",0,47],[14,1155,"934.5.45.0-1vmw",-50,1465996090,1468208653,-1,"","",0,14],[14,963,"934.5.45.0-1vmw",-50,1465995972,1468208526,-1,"","",0,14],
"done" : true}
首先将键更改为完整对象数组的示例
//function to convert main data to array of objects
function convertToArrayOfObjects(data) {
var keys = data.shift(),
i = 0, k = 0,
obj = null,
output = [];
for (i = 0; i < data.length; i++) {
obj = {};
for (k = 0; k < keys.length; k++) {
obj[keys[k]] = data[i][k];
}
output.push(obj);
}
return output;
}
上面的这个函数适用于一点修改版本的数据 在这里取样
[["ID1","ID2","TEXT1","STATE1","DATE1","DATE2","STATE2","TEXT2","TEXT3","ID3"],
[14,377,"2.102.453.0",-1,1449654863,1468208273,-1,"","",0,45],
[14,406,"2.102.453.0",-1,1449654943,1468208477,-1,"","",0,45],
[22,202,"10.2.293.0",0,1449655898,1453867824,-1,"","",0,8],
[22,381,"10.2.293.0",0,1449655906,1453867875,-1,"","",0,8],
[22,570,"10.2.293.0",0,1449655912,1453867884,-1,"","",0,8],
[22,381,"1.80",0,1449655906,1453867875,-1,"","",0,41],
[22,570,"1.80",0,1449655912,1453867885,-1,"","",0,41],
[22,202,"4",0,1449655898,1453867824,-1,"","",0,60],
[22,381,"4",0,1449655906,1453867875,-1,"","",0,60],
[22,570,"4",0,1449655913,1453867885,-1,"","",0,60],
[22,202,"A20",0,1449655898,1453867824,-1,"","",0,52]]
另外考虑使用memcached https://memcached.org/或redis https://redis.io/来缓存服务器端的数据,根据数据大小,redis可能会让你更进一步