如何选择所有唯一记录的第一行和最后一行。
我尝试了下面的代码,但我知道这是不正确的。首先,它仅占用一列,而在此列中遗漏了其他列。
for key, value in df['x'].iteritems():
# print(key, value)
if temp != value:
print(temp)
temp = value
Expected output records are highlighted in yellow in the table.
答案 0 :(得分:1)
更新:在更好地理解了OP的问题之后,我想我已经找到了正确的解决方案
初始表
CREATE TABLE TEMP_CUST
AS
(SELECT
CUSTOMER#, LASTNAME,
FIRSTNAME, ADDRESS, CITY,
STATE, ZIP, REFERRED,
REGION, EMAIL
FROM
CUSTOMERS);
DESC TEMP_CUST;
OP提到这是一个时序数据,因此我按时间列(“ x”)对数据进行了分组,并获得了第一行和最后一行。我附加了两个表,并按索引(“ x”)对它们进行了排序,并删除了重复项以清理输出。
SELECT *
FROM USER_CONSTRAINTS
WHERE TABLE_NAME = 'TEMP_CUST';
最终结果在+----------------+
|x |y |z |
+----------------+
|111000004 |1 |1 |
|111000014 |5 |1 |
|111000014 |5 |2 |
|111001605 |2 |1 |
|111001605 |2 |2 |
|111003425 |1 |1 |
|111003425 |1 |2 |
|111003425 |1 |3 |
|111003748 |4 |1 |
|111003748 |4 |2 |
|111003748 |3 |4 |
|111003748 |2 |3 |
|111003748 |1 |1 |
+----------------+
中,如下所示。
g = df.groupby(['x'])
d = g.first().append(g.last()).sort_index().reset_index().drop_duplicates()
要获取DataFrame中的所有唯一行,您可以这样做
d
然后要获得第一行和最后一行,可以在+----------------+
|x |y |z |
+----------------+
|111000004 |1 |1 |
|111000014 |5 |1 |
|111000014 |5 |2 |
|111001605 |2 |1 |
|111001605 |2 |2 |
|111003425 |1 |1 |
|111003425 |1 |3 |
|111003748 |4 |1 |
|111003748 |1 |1 |
+----------------+
上调用unique_df = df.drop_duplicates()
和head()
tail()