Question

我在google上和此处进行了尝试，试图找到答案，但似乎无法正确说出措辞以寻求有关此确切问题的帮助。

我想创建一个数据框，该数据框具有一个名为“部门”的列，其中包含来自列表的值，然后对于该列中的每个值，我都希望具有相同的日期时间范围。

列表为：

departments = ['Sales', 'Specialist', 'Purchase', 'HR']

并且daterange是（df是我拥有的与原始日期范围不同的数据框。）：

pd.date_range(start=df.InvoiceDate.min(), end=df.InvoiceDate.max(), freq='1D')

因此，我尝试了此操作，但由于形状原因给了我一个错误，我知道只是不确定如何解决。

df2 = pd.DataFrame(department,(pd.date_range(start=df.InvoiceDate.min(), end=df.InvoiceDate.max(), freq='1D')), columns=['Department',"InvoiceDate"])

期望的结果是这样的：

          Department    InvoiceDate
    0        Sales      2019-03-25
    1        Sales      2019-03-26
    2        Sales      2019-03-27
    ...
    5     Specialist    2019-03-25
    6     Specialist    2019-03-26
    7     Specialist    2019-03-27
    ...
    8      Purchase     2019-03-25
    9      Purchase     2019-03-26
   10      Purchase     2019-03-27
    ...
   11         HR        2019-03-25
   12         HR        2019-03-26
   13         HR        2019-03-27

谢谢

编辑：错误代码

>>> df2 = pd.DataFrame(workstream,(pd.date_range(start=df.InvoiceDate.min(), end=df.InvoiceDate.max(), freq='1D')), columns=['WorkStream',"InvoiceDate"])
Traceback (most recent call last):
  File "C:\Python38-32\lib\site-packages\pandas\core\internals\managers.py", line 1678, in create_block_manager_from_blocks
    make_block(values=blocks[0], placement=slice(0, len(axes[0])))
  File "C:\Python38-32\lib\site-packages\pandas\core\internals\blocks.py", line 3284, in make_block
    return klass(values, ndim=ndim, placement=placement)
  File "C:\Python38-32\lib\site-packages\pandas\core\internals\blocks.py", line 2792, in __init__
    super().__init__(values, ndim=ndim, placement=placement)
  File "C:\Python38-32\lib\site-packages\pandas\core\internals\blocks.py", line 126, in __init__
    raise ValueError(
ValueError: Wrong number of items passed 1, placement implies 2

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python38-32\lib\site-packages\pandas\core\frame.py", line 464, in __init__
    mgr = init_ndarray(data, index, columns, dtype=dtype, copy=copy)
  File "C:\Python38-32\lib\site-packages\pandas\core\internals\construction.py", line 213, in init_ndarray
    return create_block_manager_from_blocks(block_values, [columns, index])
  File "C:\Python38-32\lib\site-packages\pandas\core\internals\managers.py", line 1688, in create_block_manager_from_blocks
    construction_error(tot_items, blocks[0].shape[1:], axes, e)
  File "C:\Python38-32\lib\site-packages\pandas\core\internals\managers.py", line 1718, in construction_error
    raise ValueError(
ValueError: Shape of passed values is (8, 1), indices imply (533, 2)

Answer 1

为此，您可以使用以下代码：

声明部门列表，并获取范围内的日期列表（最小和最大）

departments = ['Sales', 'Specialist', 'Purchase', 'HR']

dates = pd.date_range(start=df.InvoiceDate.min(), end=df.InvoiceDate.max(), freq='1D').tolist()

您要使用笛卡尔积，请使用以下功能

def cartesian_product(data):
    index = pd.MultiIndex.from_product(data.values(), names=data.keys())
    return pd.DataFrame(index=index).reset_index()

cartesian_product({'departments': departments,
                   'date': a})

在这里link，您可以了解有关熊猫和MultiIndex的更多信息

Answer 2

您以错误的方式调用pd.DataFrame（）。另外，作为数据提供的2数组大小不同。要解决此问题，可以执行以下操作：

 departments = ['Sales', 'Specialist', 'Purchase', 'HR']
 sizeDates = len(dates)
 sizeDep = len(departments)
 departments = departments * sizeDates
 dates = dates * sizeDep 
 dates = pd.date_range(start=df.InvoiceDate.min(), end=df.InvoiceDate.max(), freq='1D').tolist()
 departments = departments * len(dates)
 data = {'departments': departments,'date': dates}

 df2 = pd.DataFrame(data)

熊猫：从列表创建数据框，每个元素的重复日期

2 个答案: