我有一个pandas系列词典,我想将它转换为具有相同索引的数据框。
我找到的唯一方法是通过系列的to_dict
方法,这不是很有效,因为它回到了纯python模式而不是numpy / pandas / cython。
您对更好的方法有什么建议吗?
非常感谢。
>>> import pandas as pd
>>> flagInfoSeries = pd.Series(({'a': 1, 'b': 2}, {'a': 10, 'b': 20}))
>>> flagInfoSeries
0 {'a': 1, 'b': 2}
1 {'a': 10, 'b': 20}
dtype: object
>>> pd.DataFrame(flagInfoSeries.to_dict()).T
a b
0 1 2
1 10 20
答案 0 :(得分:3)
我认为你可以使用理解:
import pandas as pd
flagInfoSeries = pd.Series(({'a': 1, 'b': 2}, {'a': 10, 'b': 20}))
print flagInfoSeries
0 {u'a': 1, u'b': 2}
1 {u'a': 10, u'b': 20}
dtype: object
print pd.DataFrame(flagInfoSeries.to_dict()).T
a b
0 1 2
1 10 20
print pd.DataFrame([x for x in flagInfoSeries])
a b
0 1 2
1 10 20
<强>时序强>:
In [203]: %timeit pd.DataFrame(flagInfoSeries.to_dict()).T
The slowest run took 4.46 times longer than the fastest. This could mean that an intermediate result is being cached
1000 loops, best of 3: 554 µs per loop
In [204]: %timeit pd.DataFrame([x for x in flagInfoSeries])
The slowest run took 5.11 times longer than the fastest. This could mean that an intermediate result is being cached
1000 loops, best of 3: 361 µs per loop
In [209]: %timeit flagInfoSeries.apply(lambda dict: pd.Series(dict))
The slowest run took 4.76 times longer than the fastest. This could mean that an intermediate result is being cached
1000 loops, best of 3: 751 µs per loop
编辑:
如果您需要保留索引,请尝试将index=flagInfoSeries.index
添加到DataFrame
构造函数:
print pd.DataFrame([x for x in flagInfoSeries], index=flagInfoSeries.index)
<强>计时强>:
In [257]: %timeit pd.DataFrame([x for x in flagInfoSeries], index=flagInfoSeries.index)
1000 loops, best of 3: 350 µs per loop
<强>示例强>:
import pandas as pd
flagInfoSeries = pd.Series(({'a': 1, 'b': 2}, {'a': 10, 'b': 20}))
flagInfoSeries.index = [2,8]
print flagInfoSeries
2 {u'a': 1, u'b': 2}
8 {u'a': 10, u'b': 20}
print pd.DataFrame(flagInfoSeries.to_dict()).T
a b
2 1 2
8 10 20
print pd.DataFrame([x for x in flagInfoSeries], index=flagInfoSeries.index)
a b
2 1 2
8 10 20
答案 1 :(得分:0)
这可以避免to_dict
,但apply
也可能会很慢:
flagInfoSeries.apply(lambda dict: pd.Series(dict))
修改:我看到jezrael添加了时间比较。这是我的:
%timeit flagInfoSeries.apply(lambda dict: pd.Series(dict))
1000 loops, best of 3: 935 µs per loop