我想在数据集上实现一个程序,该数据集由以下几列组成:
+-----------+---------------+-------------------+-----------------------+
|Item_ID |Product_Name |Manufacturer_Name |Product_Description |
+-----------+---------------+-------------------+-----------------------+
|12345 |Pen |Cello |Ball Pen Soft Nib... |
|12346 |Pencil |Nataraja |Pencil HB Extra D... |
|42345 |Ruler |Nataraja |Scale No.1103 15c... |
|12677 |Sharpener |Nataraja |Pencil Shraperner... |
|12987 |Pen |Reynolds |Dot Pen Extra Gr... |
|44326 |Pen |Reynolds |Gel Pen German T... |
|13456 |Pen |Cello |Dot Pen 0.5mm Nib... |
|19876 |Eraser |Cello |Dust free Eraser ... |
|43246 |Ink Pen |Hero |Ink Pen Smooth Ha... |
+-----------+---------------+-------------------+-----------------------+
我希望根据Manufacturer_Name
对数据集进行分组,如下所示
Manufacturer = Cello
+-----------+---------------+-------------------+-----------------------+
|Item_ID |Product_Name |Manufacturer_Name |Product_Description |
+-----------+---------------+-------------------+-----------------------+
|12345 |Pen |Cello |Ball Pen Soft Nib... |
|13456 |Pen |Cello |Dot Pen 0.5mm Nib... |
|19876 |Eraser |Cello |Dust free Eraser ... |
+-----------+---------------+-------------------+-----------------------+
Manufacturer = Nataraja
+-----------+---------------+-------------------+-----------------------+
|Item_ID |Product_Name |Manufacturer_Name |Product_Description |
+-----------+---------------+-------------------+-----------------------+
|12346 |Pencil |Nataraja |Pencil HB Extra D... |
|42345 |Ruler |Nataraja |Scale No.1103 15c... |
|12677 |Sharpener |Nataraja |Pencil Shraperner... |
+-----------+---------------+-------------------+-----------------------+
Manufacturer = Reynolds
+-----------+---------------+-------------------+-----------------------+
|Item_ID |Product_Name |Manufacturer_Name |Product_Description |
+-----------+---------------+-------------------+-----------------------+
|12987 |Pen |Reynolds |Dot Pen Extra Gr... |
|44326 |Pen |Reynolds |Gel Pen German T... |
+-----------+---------------+-------------------+-----------------------+
Manufacturer = Hero
+-----------+---------------+-------------------+-----------------------+
|Item_ID |Product_Name |Manufacturer_Name |Product_Description |
+-----------+---------------+-------------------+-----------------------+
|43246 |Ink Pen |Hero |Ink Pen Smooth Ha... |
+-----------+---------------+-------------------+-----------------------+
我尝试使用以下代码,但效果不佳。帮我改进这个程序。这是我使用的代码:
Dataset<Row> countsBy = src.select("Manufacturer_Name").distinct();
List<Row> lsts = countsBy.collectAsList();
for (Row lst : lsts) {
String man = lst.toString();
System.out.println("Records of " + man + " only");
Dataset<Row> mandataset = src.filter("Manufacturer_Name='" + man + "'");
mandataset.show();
}
答案 0 :(得分:0)
也许你可以尝试制作数据集的地图,键是一个字符串(Manufacturer_Name),每次迭代,你检查Manufacturer_Name,然后检查它是否已经在地图中(你创建它)如果需要的话)最后,你在好的数据集中添加你的行。
你会有类似的东西:
MPNowPlayingInfoCenter *center = [MPNowPlayingInfoCenter defaultCenter];
NSDictionary *songInfo = [NSDictionary dictionaryWithObjectsAndKeys:
[NSNumber numberWithDouble:songDuration],MPMediaItemPropertyPlaybackDuration,
nil];
[center setNowPlayingInfo:songInfo];
然后您需要第二个循环,但仅用于打印数据。
我希望它能解决你的问题!
编辑:通过地图重新提起Dictionnary(抱歉)并提供链接
How do you create a dictionary in Java?
编辑:更改了代码以匹配新想法