熊猫:从列表创建数据框,每个元素的重复日期

时间:2020-09-11 10:35:39

标签: python pandas dataframe datetime

我在google上和此处进行了尝试,试图找到答案,但似乎无法正确说出措辞以寻求有关此确切问题的帮助。

我想创建一个数据框,该数据框具有一个名为“部门”的列,其中包含来自列表的值,然后对于该列中的每个值,我都希望具有相同的日期时间范围。

列表为:

departments = ['Sales', 'Specialist', 'Purchase', 'HR']

并且daterange是(df是我拥有的与原始日期范围不同的数据框。):

pd.date_range(start=df.InvoiceDate.min(), end=df.InvoiceDate.max(), freq='1D')

因此,我尝试了此操作,但由于形状原因给了我一个错误,我知道只是不确定如何解决。

df2 = pd.DataFrame(department,(pd.date_range(start=df.InvoiceDate.min(), end=df.InvoiceDate.max(), freq='1D')), columns=['Department',"InvoiceDate"])

期望的结果是这样的:

          Department    InvoiceDate
    0        Sales      2019-03-25
    1        Sales      2019-03-26
    2        Sales      2019-03-27
    ...
    5     Specialist    2019-03-25
    6     Specialist    2019-03-26
    7     Specialist    2019-03-27
    ...
    8      Purchase     2019-03-25
    9      Purchase     2019-03-26
   10      Purchase     2019-03-27
    ...
   11         HR        2019-03-25
   12         HR        2019-03-26
   13         HR        2019-03-27

谢谢

编辑:错误代码

>>> df2 = pd.DataFrame(workstream,(pd.date_range(start=df.InvoiceDate.min(), end=df.InvoiceDate.max(), freq='1D')), columns=['WorkStream',"InvoiceDate"])
Traceback (most recent call last):
  File "C:\Python38-32\lib\site-packages\pandas\core\internals\managers.py", line 1678, in create_block_manager_from_blocks
    make_block(values=blocks[0], placement=slice(0, len(axes[0])))
  File "C:\Python38-32\lib\site-packages\pandas\core\internals\blocks.py", line 3284, in make_block
    return klass(values, ndim=ndim, placement=placement)
  File "C:\Python38-32\lib\site-packages\pandas\core\internals\blocks.py", line 2792, in __init__
    super().__init__(values, ndim=ndim, placement=placement)
  File "C:\Python38-32\lib\site-packages\pandas\core\internals\blocks.py", line 126, in __init__
    raise ValueError(
ValueError: Wrong number of items passed 1, placement implies 2

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python38-32\lib\site-packages\pandas\core\frame.py", line 464, in __init__
    mgr = init_ndarray(data, index, columns, dtype=dtype, copy=copy)
  File "C:\Python38-32\lib\site-packages\pandas\core\internals\construction.py", line 213, in init_ndarray
    return create_block_manager_from_blocks(block_values, [columns, index])
  File "C:\Python38-32\lib\site-packages\pandas\core\internals\managers.py", line 1688, in create_block_manager_from_blocks
    construction_error(tot_items, blocks[0].shape[1:], axes, e)
  File "C:\Python38-32\lib\site-packages\pandas\core\internals\managers.py", line 1718, in construction_error
    raise ValueError(
ValueError: Shape of passed values is (8, 1), indices imply (533, 2)

2 个答案:

答案 0 :(得分:1)

为此,您可以使用以下代码:

声明部门列表,并获取范围内的日期列表(最小和 最大)

departments = ['Sales', 'Specialist', 'Purchase', 'HR']

dates = pd.date_range(start=df.InvoiceDate.min(), end=df.InvoiceDate.max(), freq='1D').tolist()

您要使用笛卡尔积,请使用以下功能

def cartesian_product(data):
    index = pd.MultiIndex.from_product(data.values(), names=data.keys())
    return pd.DataFrame(index=index).reset_index()

cartesian_product({'departments': departments,
                   'date': a})

在这里link,您可以了解有关熊猫和MultiIndex的更多信息

答案 1 :(得分:1)

您以错误的方式调用pd.DataFrame()。另外,作为数据提供的2数组大小不同。要解决此问题,可以执行以下操作:

 departments = ['Sales', 'Specialist', 'Purchase', 'HR']
 sizeDates = len(dates)
 sizeDep = len(departments)
 departments = departments * sizeDates
 dates = dates * sizeDep 
 dates = pd.date_range(start=df.InvoiceDate.min(), end=df.InvoiceDate.max(), freq='1D').tolist()
 departments = departments * len(dates)
 data = {'departments': departments,'date': dates}

 df2 = pd.DataFrame(data)