我正在尝试从熊猫list
获取tuples
DataFrame
。我更习惯于apache-spark
之类的其他API,其中DataFrame
有一个名为collect
的方法,但我搜索了一下并找到了this approach。但结果不是我的预期,我认为这是因为DataFrame
汇总了数据。有没有简单的方法呢?
让我说明我的问题:
print(df)
#date user Cost
#2016-10-01 xxxx 0.598111
# yyyy 0.598150
# zzzz 13.537223
#2016-10-02 xxxx 0.624247
# yyyy 0.624302
# zzzz 14.651441
print(df.values)
#[[ 0.59811124]
# [ 0.59814985]
# [ 13.53722286]
# [ 0.62424731]
# [ 0.62430216]
# [ 14.65144134]]
#I was expecting something like this:
[("2016-10-01", "xxxx", 0.598111),
("2016-10-01", "yyyy", 0.598150),
("2016-10-01", "zzzz", 13.537223)
("2016-10-02", "xxxx", 0.624247),
("2016-10-02", "yyyy", 0.624302),
("2016-10-02", "zzzz", 14.651441)]
我尝试了@Dervin的建议,但结果并不令人满意。
collected = [for tuple(x) in df.values]
collected
[(0.59811124000000004,), (0.59814985000000032,), (13.53722285999994,),
(0.62424731000000044,), (0.62430216000000027,), (14.651441339999931,),
(0.62414758000000026,), (0.62423407000000042,), (14.655454959999938,)]
答案 0 :(得分:2)
这是您在那里获得的分层索引,因此首先您可以执行此SO question中的内容,然后执行[tuple(x) for x in df1.to_records(index=False)]
之类的操作。例如:
df1 = pd.DataFrame(np.random.randn(10, 4), columns=['a', 'b', 'c', 'd'])
In [12]: df1
Out[12]:
a b c d
0 0.076626 -0.761338 0.150755 -0.428466
1 0.956445 0.769947 -1.433933 1.034086
2 -0.211886 -1.324807 -0.736709 -0.767971
...
In [13]: [tuple(x) for x in df1.to_records(index=False)]
Out[13]:
[(0.076625682946709128,
-0.76133754774190276,
0.15075466312259322,
-0.42846644471544015),
(0.95644517961731257,
0.76994677126920497,
-1.4339326896803839,
1.0340857719122247),
(-0.21188555188408928,
-1.3248066626301633,
-0.73670886051415208,
-0.76797061516159393),
...