我在google上和此处进行了尝试,试图找到答案,但似乎无法正确说出措辞以寻求有关此确切问题的帮助。
我想创建一个数据框,该数据框具有一个名为“部门”的列,其中包含来自列表的值,然后对于该列中的每个值,我都希望具有相同的日期时间范围。
列表为:
departments = ['Sales', 'Specialist', 'Purchase', 'HR']
并且daterange是(df是我拥有的与原始日期范围不同的数据框。):
pd.date_range(start=df.InvoiceDate.min(), end=df.InvoiceDate.max(), freq='1D')
因此,我尝试了此操作,但由于形状原因给了我一个错误,我知道只是不确定如何解决。
df2 = pd.DataFrame(department,(pd.date_range(start=df.InvoiceDate.min(), end=df.InvoiceDate.max(), freq='1D')), columns=['Department',"InvoiceDate"])
期望的结果是这样的:
Department InvoiceDate
0 Sales 2019-03-25
1 Sales 2019-03-26
2 Sales 2019-03-27
...
5 Specialist 2019-03-25
6 Specialist 2019-03-26
7 Specialist 2019-03-27
...
8 Purchase 2019-03-25
9 Purchase 2019-03-26
10 Purchase 2019-03-27
...
11 HR 2019-03-25
12 HR 2019-03-26
13 HR 2019-03-27
谢谢
编辑:错误代码
>>> df2 = pd.DataFrame(workstream,(pd.date_range(start=df.InvoiceDate.min(), end=df.InvoiceDate.max(), freq='1D')), columns=['WorkStream',"InvoiceDate"])
Traceback (most recent call last):
File "C:\Python38-32\lib\site-packages\pandas\core\internals\managers.py", line 1678, in create_block_manager_from_blocks
make_block(values=blocks[0], placement=slice(0, len(axes[0])))
File "C:\Python38-32\lib\site-packages\pandas\core\internals\blocks.py", line 3284, in make_block
return klass(values, ndim=ndim, placement=placement)
File "C:\Python38-32\lib\site-packages\pandas\core\internals\blocks.py", line 2792, in __init__
super().__init__(values, ndim=ndim, placement=placement)
File "C:\Python38-32\lib\site-packages\pandas\core\internals\blocks.py", line 126, in __init__
raise ValueError(
ValueError: Wrong number of items passed 1, placement implies 2
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python38-32\lib\site-packages\pandas\core\frame.py", line 464, in __init__
mgr = init_ndarray(data, index, columns, dtype=dtype, copy=copy)
File "C:\Python38-32\lib\site-packages\pandas\core\internals\construction.py", line 213, in init_ndarray
return create_block_manager_from_blocks(block_values, [columns, index])
File "C:\Python38-32\lib\site-packages\pandas\core\internals\managers.py", line 1688, in create_block_manager_from_blocks
construction_error(tot_items, blocks[0].shape[1:], axes, e)
File "C:\Python38-32\lib\site-packages\pandas\core\internals\managers.py", line 1718, in construction_error
raise ValueError(
ValueError: Shape of passed values is (8, 1), indices imply (533, 2)
答案 0 :(得分:1)
为此,您可以使用以下代码:
声明部门列表,并获取范围内的日期列表(最小和 最大)
departments = ['Sales', 'Specialist', 'Purchase', 'HR']
dates = pd.date_range(start=df.InvoiceDate.min(), end=df.InvoiceDate.max(), freq='1D').tolist()
您要使用笛卡尔积,请使用以下功能
def cartesian_product(data):
index = pd.MultiIndex.from_product(data.values(), names=data.keys())
return pd.DataFrame(index=index).reset_index()
cartesian_product({'departments': departments,
'date': a})
在这里link,您可以了解有关熊猫和MultiIndex的更多信息
答案 1 :(得分:1)
您以错误的方式调用pd.DataFrame()。另外,作为数据提供的2数组大小不同。要解决此问题,可以执行以下操作:
departments = ['Sales', 'Specialist', 'Purchase', 'HR']
sizeDates = len(dates)
sizeDep = len(departments)
departments = departments * sizeDates
dates = dates * sizeDep
dates = pd.date_range(start=df.InvoiceDate.min(), end=df.InvoiceDate.max(), freq='1D').tolist()
departments = departments * len(dates)
data = {'departments': departments,'date': dates}
df2 = pd.DataFrame(data)