从pandas dataframe

时间:2017-12-04 20:46:25

标签: python pandas

在下面。 dataframe,我有一个年和月的值集合作为列表中的元组:

state
alabama           [(2017.0, 10.0), (2017.0, 11.0), (2017.0, 12.0), (2018.0, 1.0)]
arkansas          [(2017.0, 10.0), (2017.0, 11.0), (2017.0, 12.0)]
colorado          [(2017.0, 9.0), (2017.0, 10.0), (2017.0, 11.0)]

如何提取年度和月份组合的超集列表?在这种情况下,soln将是:

[(2017.0, 9.0), (2017.0, 10.0), (2017.0, 11.0), (2017.0, 12.0), (2018.0, 1.0)]

我可以使用for循环来做这件事,但那会慢,哪个更pythonic?

以下是我的尝试:

for row in df:
    if all(y in row for x, y in df):
        tmp = row

但是我收到了这个错误:

ValueError: too many values to unpack (expected 2)

1 个答案:

答案 0 :(得分:1)

使用sample DF from your previous question

In [109]: df[['Year','Month']].sort_values(['Year','Month']).drop_duplicates().values.tolist()
Out[109]:
[[2017.0, 10.0],
 [2017.0, 11.0],
 [2017.0, 12.0],
 [2018.0, 1.0],
 [2018.0, 2.0],
 [2018.0, 3.0],
 [2018.0, 4.0],
 [2018.0, 5.0],
 [2018.0, 6.0],
 [2018.0, 7.0]]