如何根据不同的列选择整行

时间:2016-06-25 23:07:15

标签: sql pyspark apache-spark-sql hiveql

我在火花中做这件事

cityId  PhysicalAddress      EmailAddress         ..many other columns of other meta info...   
1       b st                 something@email.com   
1       b st                 something@email.com   <- some rows can be entirely duplicates
1       a avenue             random@gmail.com
2       c square             anything@yahoo.com
2       d blvd               d@d.com

此表上没有主键,我想根据每个不同的cityId抓取一个随机行

e.g。这是一个正确的答案

cityId  PhysicalAddress      EmailAddress        ..many other columns 
1       b st                 something@email.com   
2       c square             anything@yahoo.com

e.g。这也是一个正确的答案

cityId  PhysicalAddress      EmailAddress       ..many other columns 
1       a avenue             random@gmail.com
2       c square             anything@yahoo.com

想到的一种方法是使用group by。但是,这需要我在另一列上使用聚合函数。 (例如min())。然而,我只是想拉出一整行(并不重要)。

2 个答案:

答案 0 :(得分:0)

;WITH CTE AS
(
   SELECT *, ROW_NUMBER() OVER(PARTITION BY cityId ORDER BY cityId) AS RN
   FROM [TABLE_NAME]
) SELECT * FROM CTE WHERE RN = 1

答案 1 :(得分:0)

我有Sql Server 2008 R2,但试图找到适用于其他DBMS的方法。

public class MyActivity extends AppCompatActivity   {

    @Override
    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.activity_my);

        ArrayList<User> arrayOfUsers = new ArrayList<User>();

        arrayOfUsers.add(new User("usr 1"));
        arrayOfUsers.add(new User("usr 2"));
        arrayOfUsers.add(new User("usr 3"));

        UsersAdapter adapter = new UsersAdapter(getApplicationContext(), arrayOfUsers);
        ListView listView = (ListView) findViewById(R.id.listView);
        listView.setAdapter(adapter);
        User newUser = new User("Article1");
        adapter.add(newUser);

    }


    public class User {
        public String article;

        public User(String article) {
            this.article = article;
        }
    }
    public class UsersAdapter extends ArrayAdapter<User> {
        public UsersAdapter(Context context, ArrayList<User> users) {
            super(context, 0, users);
        }

        @Override
        public View getView(int position, View convertView, ViewGroup parent) {
            User user = getItem(position);

            if (convertView == null) {
                convertView = LayoutInflater.from(getContext()).inflate(R.layout.items_list, parent, false);
            }
            TextView articlename = (TextView) convertView.findViewById(R.id.tvarticle);
            articlename.setText(user.article);
            return convertView;
        }

    }
}

我也尝试使用create table contacts( cityId int, PhysicalAddress varchar(max), EmailAddress varchar(max) ) delete contacts insert contacts( cityId, PhysicalAddress, EmailAddress ) /** ..many other columns of other meta info... */ values ( 1, 'b st', 'something@email.com' ) , ( 1, 'b st', 'something@email.com' ) /* some rows can be entirely duplicates */ , ( 1, 'a avenue', 'random@gmail.com' ) , ( 2, 'c square', 'anything@yahoo.com' ) , ( 2, 'd blvd', 'd@d.com' ) , ( 3, 'e circuit', 'e@e.com' ) -- using row_number() with c as ( select *, row_number() over (partition by cityId order by cityId) as seqnum from contacts ) select * from c where seqnum = 1; -- Add a new identity column alter table contacts add id int identity(1,1) select * from contacts where id in (select min(id) from contacts group by cityID) -- Variation: Create a copy into a temp table and add an identity column -- Note: It may not be possible to modify original table select * into #contacts from contacts alter table #contacts add id int identity(1,1) select * from #contacts where id in (select min(id) from #contacts group by cityID) 使用计算列,但我的兴奋是短暂的,因为当您将表连接到自身或在该表上使用子查询时,将为每个{{1}重新计算计算列},这样做不起作用。你不能创建那个计算列newid() - 这对于像SELECT这样的非确定性表达式是不允许的,它在每次在给定行上调用时返回不同的东西。