我有一堆Pandas系列,它们一次生成一个,我想将它们分配为DataFrame中的一行,DataFrame的列是所有Series索引值的并集。
例如:
import numpy as np
import pandas as pd
# the names of all series are known in advance
df = pd.DataFrame(index=['A', 'B'])
# in reality there are many long series, not just two
a = pd.Series({'v':0, 'w':1, 'x':2, 'y':3}, name='A')
b = pd.Series({ 'x':4, 'y':5, 'z':6}, name='B')
# generate and assign each series as one row in the frame
for row in (a,b):
# create new columns - this is what I want to eliminate
for column in row.index.difference(df.columns):
df[column] = np.nan
df.loc[row.name] = row
print(df)
这会产生所需的结果:
v w x y z
A 0.0 1.0 2.0 3.0 NaN
B NaN NaN 4.0 5.0 6.0
但如果没有for column
循环,它会生成一个没有列的空DataFrame。
我希望消除for column
循环。我没有提前知道所有栏目。我还希望以矢量化方式将np.nan
分配给所有新列,但由于我在此处提交的旧问题而无法正常工作:https://github.com/pandas-dev/pandas/issues/13658
答案 0 :(得分:1)
pd.DataFrame.set_value
会自动添加列。
df = pd.DataFrame()
# in reality there are many long series, not just two
a = pd.Series({'v':0, 'w':1, 'x':2, 'y':3}, name='A')
b = pd.Series({ 'x':4, 'y':5, 'z':6}, name='B')
# generate and assign each series as one row in the frame
for row in (a,b):
for i, v in row.iteritems():
df.set_value(row.name, i, v)
print(df)
v w x y z
A 0.0 1.0 2.0 3.0 NaN
B NaN NaN 4.0 5.0 6.0
这仍然是一个循环,但set_value
非常活跃。
时间测试
小数据
df = pd.DataFrame()
los = [pd.Series(1, [i], name=i) for i in range(10)]
stmt1 = """
for row in los:
for column in row.index.difference(df.columns):
df[column] = np.nan
df.loc[row.name, row.index] = row
"""
stmt2 = """
for row in los:
for col, value in row.iteritems():
df.set_value(row.name, col, value)
"""
setup = """
from __main__ import df, los, np
"""
print(timeit(stmt1, setup, number=100))
print(timeit(stmt2, setup, number=100))
0.5426401197910309
0.01039268122985959
大数据
df = pd.DataFrame()
los = [pd.Series(1, [i], name=i) for i in range(1000)]
stmt1 = """
for row in los:
for column in row.index.difference(df.columns):
df[column] = np.nan
df.loc[row.name, row.index] = row
"""
stmt2 = """
for row in los:
for col, value in row.iteritems():
df.set_value(row.name, col, value)
"""
setup = """
from __main__ import df, los, np
"""
print(timeit(stmt1, setup, number=100))
print(timeit(stmt2, setup, number=100))
63.69273182330653
1.1242545540444553
答案 1 :(得分:0)
您可以将多个系列作为列表传递给DataFrame
构造函数:
import pandas as pd
a = pd.Series({'v':0, 'w':1, 'x':2, 'y':3}, name='A')
b = pd.Series({ 'x':4, 'y':5, 'z':6}, name='B')
df = pd.DataFrame([a, b])
print(df)
v w x y z
A 0.0 1.0 2.0 3.0 NaN
B NaN NaN 4.0 5.0 6.0