我在这里有三个列表
[1,2,3,4,5]
[5,4,6,7,2]
[1,2,4,5,6,7,8,9,0]
我想要这种输出:
A B C
1 5 1
2 4 2
3 6 4
4 7 5
5 2 6
7
8
9
0
我尝试了一种语法,但是它给了我这个错误arrays must all be same length
,另一个错误是Length of values does not match length of index
有没有办法获得这种输出?
答案 0 :(得分:5)
这不容易得到支持,但是可以做到。 DataFrame.from_dict
将以“索引”为方向。假设您的列表是A
,B
和C
:
pd.DataFrame([A, B, C]).T
0 1 2
0 1.0 5.0 1.0
1 2.0 4.0 2.0
2 3.0 6.0 4.0
3 4.0 7.0 5.0
4 5.0 2.0 6.0
5 NaN NaN 7.0
6 NaN NaN 8.0
7 NaN NaN 9.0
8 NaN NaN 0.0
另一个选择是使用DataFrame.from_dict
:
pd.DataFrame.from_dict({'A' : A, 'B' : B, 'C' : C}, orient='index').T
A B C
0 1.0 5.0 1.0
1 2.0 4.0 2.0
2 3.0 6.0 4.0
3 4.0 7.0 5.0
4 5.0 2.0 6.0
5 NaN NaN 7.0
6 NaN NaN 8.0
7 NaN NaN 9.0
8 NaN NaN 0.0
使用zip_longest
和DataFrame.from_records
的第三个解决方案:
from itertools import zip_longest
pd.DataFrame.from_records(zip_longest(A, B, C), columns=['A', 'B', 'C'])
# pd.DataFrame.from_records(list(zip_longest(A, B, C)), columns=['A', 'B', 'C'])
A B C
0 1.0 5.0 1
1 2.0 4.0 2
2 3.0 6.0 4
3 4.0 7.0 5
4 5.0 2.0 6
5 NaN NaN 7
6 NaN NaN 8
7 NaN NaN 9
8 NaN NaN 0
答案 1 :(得分:4)
另一种方法是对每个列表的Series
进行列表理解,并由此构造一个df:
In[61]:
df = pd.DataFrame([pd.Series(x) for x in [A,B,C]], index=list('ABC')).T
df
Out[61]:
A B C
0 1.0 5.0 1.0
1 2.0 4.0 2.0
2 3.0 6.0 4.0
3 4.0 7.0 5.0
4 5.0 2.0 6.0
5 NaN NaN 7.0
6 NaN NaN 8.0
7 NaN NaN 9.0
8 NaN NaN 0.0
%timeit pd.DataFrame([pd.Series(x) for x in [A,B,C]], index=list('ABC')).T
%timeit pd.DataFrame.from_dict({'A' : A, 'B' : B, 'C' : C}, orient='index').T
from itertools import zip_longest
%timeit pd.DataFrame.from_records(list(zip_longest(A, B, C)), columns=['A', 'B', 'C'])
1.23 ms ± 12 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
977 µs ± 1.63 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
545 µs ± 8.08 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
所以最后一种方法是最快的
答案 2 :(得分:0)
一种自定义方式的想法。
定义几种方法来调整输入数据:
def longest(*lists):
return max([ len(x) for x in lists])
def equalize(col, size):
delta = size - len(col)
if delta == 0: return col
return col + [None for _ in range(delta)]
用于构建数据框:
import pandas as pd
size = longest(col1, col2, col3)
df = pd.DataFrame({'a':equalize(col1, size), 'b':equalize(col2, size), 'c':equalize(col3, size)})
返回哪个
a b c
0 1.0 5.0 1
1 2.0 4.0 2
2 3.0 6.0 4
3 4.0 7.0 5
4 5.0 2.0 6
5 NaN NaN 7
6 NaN NaN 8
7 NaN NaN 9
8 NaN NaN 0