我有一个数据集,其中最后一项是句子形式的字符串。我的目标是将句子分解为单词,并创建一个新的数据集,其中每个单词都位于单独的行上,如下所示:
这是旧数据集的格式:
0: Object { creator: "molly", number: 3, doc: "The cat in the hat ate the rat", … }
1: Object { creator: "may", number: 4, doc: "the crass rat", … }
2: Object { creator: "may", number: 4, doc: "The mouse in the pouch at the cat", … }
3: Object { creator: "may", number: 4, doc: "the fish hog", … }
4: Object { creator: "may", number: 4, doc: "the dog warm", … }
这是我想要的格式:
0: Object { creator: "molly", number: 3, doc: "The", … }
1: Object { creator: "molly", number: 3, doc: "cat", … }
2: Object { creator: "molly", number: 3, doc: "in", … }
3: Object { creator: "molly", number: 3, doc: "the", … }
4: Object { creator: "molly", number: 3, doc: "hat", … }
5: Object { creator: "molly", number: 3, doc: "ate", … }
6: Object { creator: "molly", number: 3, doc: "the", … }
7: Object { creator: "molly", number: 3, doc: "rat", … }
8: Object { creator: "may", number: 4, doc: "the", … }
9: Object { creator: "may", number: 4, doc: "crass", … }
10: Object { creator: "may", number: 4, doc: "rat", … }
我正在使用D3。以下代码使我可以生成一个新的数据集,其中每个单词都位于单独的行上:
doc.csv:
date,number,creator,,doc
6/16/2000,3,molly,3,The cat in the hat ate the rat
2/25/2002,4,may,2,The mouse in the pouch at the cat
12/5/2004,3,molly,4,the lovely fish
7/6/2006,1,milly,1,the pog dog
9/7/2003,4,may,4,the fish hog
12/10/2001,4,may,3,the crass rat
6/15/2005,2,maggie,3,the ass rat
6/9/2004,1,milly,4,the fish blue
10/5/2005,1,milly,3,the rat true
10/7/2003,4,may,1,the dog warm
1/19/2009,4,may,2,the cat norm
10/30/2007,1,milly,4,the fish wish
8/13/2009,4,may,2,cat bat ticks
9/30/2004,3,molly,1,dog nog mog
1/17/2006,4,may,3,rat tittily too
12/18/2009,3,molly,1,dog coppily poo
11/2/2007,2,maggie,3,rat pitpat poo
4/17/2007,1,milly,4,fish too!
html:
<!DOCTYPE html>
<html lang="en">
<head>
<meta chartset="utf-8">
<title>Interactive scatterplot</title>
<link rel="stylesheet" type="text/css" href="style.css">
<script type="text/javascript" src="d3.v4.js"></script>
</head>
<body>
<script type="text/javascript" src="split.js"></script>
<textarea id="txtName" name="txt-Name" placeholder="Search for something.." rows="1"></textarea>
</div>
</body>
</html>
代码:
var parseDate = d3.timeParse("%m/%d/%Y");
var hoot = function(d) {return d.doc.split(" ").forEach(function (item) {
var data2 = {creator: d.creator, date: parseDate(d.date),item: item}
console.log(data2)
});}
d3.csv("doc.csv")
.row(function(d) {return {creator: d.creator,date: parseDate(d.date),number: Number(d.number),doc: d.doc, split: (hoot(d))};})
.get(function(error, data) {
});
令人高兴的是,当我console.log data2时,我得到了一些接近最终目标的东西:
我有两个问题:
1)函数运行后,变量data2
不可用。我试图通过将data2
放在脚本的开头来使var data2 = [];
成为全局变量,但这是行不通的。
2)变量data2
不采用单个数组的形式。我尝试将方括号放在变量行(即var data2 = [{creator: d.creator, date: parseDate(d.date),item: item}]
)周围,但这会形成许多数组,而不是一个数组。
提前感谢您的宝贵时间。
答案 0 :(得分:3)
这里data2
是foreach
循环内的局部变量。因此,即使将其设置为全局值,也只会在上一次迭代期间获得该值。相反,您可以在每次迭代期间将data2
做成一个数组并将push
的值放入其中。可能看起来像这样
var parseDate = d3.timeParse("%m/%d/%Y");
var data2 = [];
var hoot = function(d) {return d.doc.split(" ").forEach(function (item) {
data2.push({creator: d.creator, date: parseDate(d.date),item: item})
});}
console.log(data2);
d3.csv("doc.csv")
.row(function(d) {return {creator: d.creator,date: parseDate(d.date),number: Number(d.number),doc: d.doc, split: (hoot(d))};})
.get(function(error, data) {
});
现在控制台登录并查看,希望您能获得预期的结果。