Question

为下一个条件创建Dataframe的最佳方法是什么？

我有一个带有单个列的Dataframe，有几个家庭，每个家庭下面有一些项目描述，有些家庭有3个项目，其中一些有7个，唯一提示来识别家庭是由“[在线]“字符串。

0 Family Item1[online]
1 Description of the Item1 (SKU)
2 Description of the Item1 (SKU)
3 Description of the Item1 (SKU)
4 Family Item2[online]
5 Description of the Item2 (SKU)
6 Description of the Item2 (SKU)
7 Description of the Item2 (SKU)
................................
n-3Family Itemk[online]
n-2 Description of the Itemk (SKU)
n-1 Description of the Itemk (SKU)
n Description of the Itemk (SKU)

我希望获得一个包含2列的数据框

Column1 Column2
0  Family Item1  Description Item1
1  Family Item1  Description Item1
2  Family Item1  Description Item1
3  Family Item2  Description Item2
..................................
n Family Itemk Description Itemk

所以我有[在线]的提示来识别家庭用品，每个家庭有不同数量的物品。

解决这个问题的pythonic方式是什么？

Answer 1

鉴于您的初始数据框如下所示：

import pandas as pd

df = pd.DataFrame(data=['Family Item1[online]',
                        'Description of the Item1 (SKU)',
                        'Description of the Item1 (SKU)',
                        'Description of the Item1 (SKU)',
                        'Family Item2[online]',
                        'Description of the Item2 (SKU)',
                        'Description of the Item2 (SKU)',
                        'Description of the Item2 (SKU)',],index=np.arange(0,8))

dict_i = {}
key = None

for item in df[0].values:

    if '[online]' in item:
        key = item
        dict_i[key] = []
    else:
        dict_i[key].append(item)
pd.DataFrame(dict_i)

给出了：

             Family Item1[online]            Family Item2[online]
0  Description of the Item1 (SKU)  Description of the Item2 (SKU)
1  Description of the Item1 (SKU)  Description of the Item2 (SKU)
2  Description of the Item1 (SKU)  Description of the Item2 (SKU)

如果系列长度不同：

series_list = []
for k, v in dict_i.items():
    s = pd.Series(data=v,name=k)
    series_list.append(s)

pd.concat(series_list,axis=1)

这导致数据帧缺少长度不匹配的值。

             Family Item1[online]            Family Item2[online]
0  Description of the Item1 (SKU)  Description of the Item2 (SKU)
1  Description of the Item1 (SKU)  Description of the Item2 (SKU)
2  Description of the Item1 (SKU)  Description of the Item2 (SKU)
3  Description of the Item1 (SKU)                             NaN
4  Description of the Item1 (SKU)                             NaN

使用特殊格式创建Dataframe

1 个答案: