从Spark删除分区

时间:2018-09-27 07:06:12

标签: apache-spark hive

我正在使用Java-Spark(Spark 2.2.0)。

我正在尝试按以下方式删除Hive分区:

if(buttonStates.get(position))
   button.setEnable(true);
else
   button.setEnable(false);



public class complaintAdapter extends 
RecyclerView.Adapter<RecyclerView.ViewHolder> {
private Context context;
String hello, status, type;
private LayoutInflater inflater;
public String service, typeid;
String result_reso;
List<DataComplaint> data= Collections.emptyList();
DataComplaint current;
int currentPos=0;
MyHolder holder;
ArrayList<Bollean> buttonStates = new ArrayList();

public complaintAdapter(Context context, List<DataComplaint> data, String    service_id, String type_id){
this.context=context;
inflater= LayoutInflater.from(context);
this.data=data;
this.service = service_id;
this.typeid = type_id;
}
@NonNull
@Override
public RecyclerView.ViewHolder onCreateViewHolder(@NonNull ViewGroup   viewGroup, int i) {
    View view = inflater.inflate( R.layout.container_complaint, viewGroup,     false );
    holder = new MyHolder( view );
    Log.e( TAG, "holder = " + typeid );
    return holder;
    }




 public void onBindViewHolder(@NonNull RecyclerView.ViewHolder viewHolder, final int i) {

final MyHolder myHolder= (MyHolder) viewHolder;
final DataComplaint current=data.get(i);
buttonStates.add(true);
myHolder.textcomplaint.setText(current.complaint);
myHolder.textaddress.setText("Reason: " + current.address);
myHolder.textType.setText("Client   : " + current.complaint_type);
myHolder.textplace.setText("Location: " + current.location);
myHolder.textticket.setText( current.ticket );
final int value = getItemCount();
if(typeid.equals( "history" )){
    myHolder.btn.setVisibility( View.GONE );
    myHolder.btn1.setVisibility( View.GONE );
    myHolder.btn2.setVisibility( View.GONE );
}
//  myHolder.textPrice.setTextColor( ContextCompat.getColor(context, R.color.colorAccent));
 if(buttonStates.get(position))
   button.setEnable(true);
 else
   button.setEnable(false);

if(typeid.equals( "pending" )) {
    myHolder.btn.setOnClickListener( new View.OnClickListener() {
        @Override
        public void onClick(View v) {
            int btn_position = myHolder.getLayoutPosition();
            Log.e( TAG, "total position" +btn_position );
            disbleButtonsNotEqualToPos(btn_position);
            Button button = (Button) v;
            Calendar calendar = Calendar.getInstance();
            SimpleDateFormat format = new SimpleDateFormat( "yyyy-MM-dd HH:mm:ss " );
            String time = format.format( calendar.getTime() );
            type = "In-transit";
            String tickeid = current.ticket;
            Log.e( TAG, "hello = " + service );
            new BackgroundWorker().execute( time, service, tickeid, type );
            button.setVisibility( View.GONE );
            myHolder.btn1.setVisibility( View.VISIBLE );
            myHolder.btn2.setVisibility( View.VISIBLE );





        }
    } );
 public int getItemCount() {
 return data.size();
}


 void disbleButtonsNotEqualToPos(int pos){

  for(int i=0;i<list.size();i++)
    {  
       if(i!=pos)
         buttonStates.add(false);
    }
    notifyDataSetChanged();  //reflect changes

  }


class MyHolder extends RecyclerView.ViewHolder{

TextView textcomplaint;
TextView textaddress;
TextView textType,textplace, textticket, textreso;
Button btn, btn1, btn2;
int value1;

// create constructor to get widget reference
public MyHolder(View itemView) {
    super(itemView);
        textcomplaint = (TextView) itemView.findViewById(        R.id.textcomplaint );
        textaddress = (TextView) itemView.findViewById( R.id.textaddress );
        textType = (TextView) itemView.findViewById( R.id.textType );
        textticket = (TextView) itemView.findViewById( R.id.ticketid );
        textplace = (TextView) itemView.findViewById( R.id.textplace );
        btn = (Button) itemView.findViewById( R.id.enter );
        btn1 = (Button) itemView.findViewById( R.id.repositry );
        btn2 = (Button) itemView.findViewById( R.id.exit );
}

}

并得到以下异常:

  

org.apache.spark.sql.catalyst.parser.ParseException:   输入'<'不匹配,期望{')',','}(第1行,pos 42)

我知道这是一个未解决的问题ALTER TABLE DROP PARTITION should support comparators,应该在我的版本中解决,但我仍然会遇到异常。

从Spark删除分区的替代方法是什么?还有另一种实现方法吗?

谢谢。

4 个答案:

答案 0 :(得分:2)

似乎暂时没有办法做到这一点。 如SPARK-14922所示,此修复程序的目标版本为3.0.0,并且仍在进行中。

因此,我认为有两种可能的解决方法。

让我们使用Spark 2.4.3设置问题:

// We create the table
spark.sql("CREATE TABLE IF NOT EXISTS potato (size INT) PARTITIONED BY (hour STRING)")

// Enable dynamic partitioning 
spark.conf.set("hive.exec.dynamic.partition.mode","nonstrict")

// Insert some dummy records
(1 to 9).map(i => spark.sql(s"INSERT INTO potato VALUES ($i, '2020-06-07T0$i')"))

// Verify inserts
spark.table("potato").count // 9 records

现在...试图从spark内部删除单个分区!

spark.sql("""ALTER TABLE potato DROP IF EXISTS PARTITION (hour='2020-06-07T01')""")
spark.table("potato").count // 8 records

尝试删除多个分区不起作用。

spark.sql("""ALTER TABLE potato DROP IF EXISTS PARTITION (hour="2020-06-07T02", hour="2020-06-07T03")""")

org.apache.spark.sql.catalyst.parser.ParseException:
Found duplicate keys 'hour'.(line 1, pos 34)

== SQL ==
ALTER TABLE potato DROP IF EXISTS PARTITION (hour="2020-06-07T02", hour="2020-06-07T03")
----------------------------------^^^

使用比较运算符删除一系列分区也不起作用。

spark.sql("""ALTER TABLE potato DROP IF EXISTS PARTITION (hour<="2020-06-07T03")""")

org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input '<=' expecting {')', ','}(line 1, pos 49)

== SQL ==
ALTER TABLE potato DROP IF EXISTS PARTITION (hour<="2020-06-07T03")
-------------------------------------------------^^^

这可能是因为分区列是一个字符串,并且我们正在使用比较运算符。

我找到的解决方案是:

  1. 获取分区列表并有条件地对其进行过滤。
  2. 要么逐个删除各个分区,要么将它们作为一系列[Map[String,String]TablePartitionSpec)传递给目录的dropPartitions函数。

第1步:

// Get External Catalog
val catalog = spark.sharedState.externalCatalog

// Get the spec from the list of partitions 
val partitions = catalog.listPartitions("default", "potato").map(_.spec)

// Filter according to the condition you attempted.
val filteredPartitions = partitions.flatten.filter(_._2 <= "2020-06-07T03")
                                           .map(t => Map(t._1 -> t._2))

第2步:

我们将每个参数元组传递给单独的ALTER TABLE DROP PARTITION语句。

filteredPartitions.flatten.foreach(t => 
     spark.sql(s"""ALTER TABLE potato DROP IF EXISTS PARTITION (${t._1}="${t._2}")"""))
spark.table("potato").count // 6 records

或将它们传递给目录的dropPartition函数。

// If you purge data, it gets deleted immediately and isn't moved to trash.
// This takes precedence over retainData, so even if you retainData but purge,
// your data is gone.
catalog.dropPartitions("default", "potato", filteredPartitions,
                       ignoreIfNotExists=true, purge=true, retainData=false)
spark.table("potato").count // 6 records

我希望这会有所帮助。让我知道您是否有更好的Spark 2.x解决方案。

答案 1 :(得分:2)

pyspark 人员的解决方案

  1. 获取表的所有分区。

  2. 将分区列转换为分区列表。

  3. 清理分区以仅获取值。

  4. 具有所需条件的过滤器列表。

  5. 对所有已过滤列表执行更改表操作。 请在下面找到 pyspark 相应格式的代码

     partitions = spark.sql("SHOW PARTITIONS potato")
     listpartitions = list(partitions.select('partition').toPandas()['partition'])
     cleanpartitions = [ i.split('=')[1] for i in listpartitions]
     filtered = [i for i in cleanpartitions if i < str(20180910)]
     for i in filtered:
         spark.sql("alter table potato DROP IF EXISTS PARTITION (date = '"+i+"')")
    

答案 2 :(得分:1)

您可以使用Spark编程执行相同的操作。同样,它在ref https://issues.apache.org/jira/browse/SPARK-14922

的Spark 2、2.1和2.2中未修复
    Steps 

        1 . Create hive context 
        2 . Get the table for getTable method from the hive context and you need to pass dbName, tableName and a boolean value if any error
        3 . From table Object hive.getPartitions(table) you can get the partitions from hive context (you need to decide which partitions you are going delete )
            4 . You can remove partitions using dropPartition with partition values , table name and db info (hive.dropPartition) 

    hiveContext.getPartitions(table)
    hiveContext.dropPartition(dbName, tableName, partition.getValues(), true)


You need to validate the partition name and check whether it needs to be deleted or not (you need to write custom method ).

       Or you can get the partition list sql using show partitions and from there also you can use drop partition to remove it.

This may give you some pointers .

答案 3 :(得分:1)

我认为这里的问题是您使用了'<' (lessthen)符号,因此您的数据显然必须为数字或日期类型格式,但是将其放在''中意味着它采用字符串格式的值。我建议您必须检查分区的格式。可能是您必须将其以正确的日期格式转换。