Question

我正在使用以下代码：

df.show();

我想选择不同的行，然后取样并将其限制为300条记录，但是- (UITableViewCell *)tableView:(UITableView *)tableView cellForRowAtIndexPath:(NSIndexPath *)indexPath { TableViewTileCell *cell = [tableView dequeueReusableCellWithIdentifier:@"tileCell"]; if (!cell) { [tableView registerNib:[UINib nibWithNibName:@"TableViewTileCell" bundle:nil] forCellReuseIdentifier:@"tileCell"]; cell = [tableView dequeueReusableCellWithIdentifier:@"tileCell"]; } cell.tileView.layer.borderColor = [UIColor blackColor].CGColor; cell.tileView.layer.borderWidth = 1.0f; [cell.contentView setBackgroundColor:[UIColor colorWithRed:0.96 green:0.96 blue:0.96 alpha:1.0]]; NSString *url=[self.resultsArray[indexPath.row] valueForKey:@"imageURL"]; cell.tileTitle.text = [self.resultsArray[indexPath.row] valueForKey:@"title"]; cell.tileDate.text = [self.resultsArray[indexPath.row] valueForKey:@"date"]; cell.tileContent.text = [self.resultsArray[indexPath.row] valueForKey:@"summary"]; if(![url isEqualToString:@""]){ //If we got a url value back load the image [cell.tileImageview setImageWithURL:[NSURL URLWithString:url] placeholderImage:[UIImage imageNamed:@"placeholder.png"]]; } return cell; }表示整个地方都有重复的行。我错过了什么？

谢谢！

Answer 1

分配到新数据框

val myDupeDF=myDF.select(myDF.col("EmpName"))
myDupeDF.show()
val myDistinctDf=myDF.select(myDF.col("EmpName")).distinct
myDistinctDf.show();
+-------+
|EmpName|
+-------+
|   John|
|   John|
|   John|
+-------+

分明之后

+-------+
|EmpName|
+-------+
|   John|
+-------+

所有列的更新 我选择所有列仍然适用于我。我正在使用spark 1.5.1

  val myDupeDF=myDF.select(myDF.col("*"))
    myDupeDF.show()
    val myDistinctDf=myDF.select(myDF.col("*")).distinct
    myDistinctDf.show();

结果：

+-----+-------+------+----------+
|EmpId|EmpName|Salary|SalaryDate|
+-----+-------+------+----------+
|    1|   John|1000.0|2016-01-01|
+-----+-------+------+----------+

-

Answer 2

试试这个 -

df = df.select(
            df.col("col").as("col1"),
            df.col("col_").as("col2");
df = df.distinct();
df= df.sample(true, 0.8).limit(300);
df= df.withColumn("random", lit(0));

df.show();

但我认为你需要提一个列名来执行不同的操作 -

df = df.select（＆＃34; COLUMNNAME＆＃34;）。distinct（）;

Spark DataFrame - .distinct（）不起作用？

2 个答案: