为下一个条件创建Dataframe的最佳方法是什么?
我有一个带有单个列的Dataframe,有几个家庭,每个家庭下面有一些项目描述,有些家庭有3个项目,其中一些有7个,唯一提示来识别家庭是由“[在线]“字符串。
0 Family Item1[online]
1 Description of the Item1 (SKU)
2 Description of the Item1 (SKU)
3 Description of the Item1 (SKU)
4 Family Item2[online]
5 Description of the Item2 (SKU)
6 Description of the Item2 (SKU)
7 Description of the Item2 (SKU)
................................
n-3Family Itemk[online]
n-2 Description of the Itemk (SKU)
n-1 Description of the Itemk (SKU)
n Description of the Itemk (SKU)
我希望获得一个包含2列的数据框
Column1 Column2
0 Family Item1 Description Item1
1 Family Item1 Description Item1
2 Family Item1 Description Item1
3 Family Item2 Description Item2
..................................
n Family Itemk Description Itemk
所以我有[在线]的提示来识别家庭用品,每个家庭有不同数量的物品。
解决这个问题的pythonic方式是什么?
答案 0 :(得分:0)
鉴于您的初始数据框如下所示:
import pandas as pd
df = pd.DataFrame(data=['Family Item1[online]',
'Description of the Item1 (SKU)',
'Description of the Item1 (SKU)',
'Description of the Item1 (SKU)',
'Family Item2[online]',
'Description of the Item2 (SKU)',
'Description of the Item2 (SKU)',
'Description of the Item2 (SKU)',],index=np.arange(0,8))
dict_i = {}
key = None
for item in df[0].values:
if '[online]' in item:
key = item
dict_i[key] = []
else:
dict_i[key].append(item)
pd.DataFrame(dict_i)
给出了:
Family Item1[online] Family Item2[online]
0 Description of the Item1 (SKU) Description of the Item2 (SKU)
1 Description of the Item1 (SKU) Description of the Item2 (SKU)
2 Description of the Item1 (SKU) Description of the Item2 (SKU)
如果系列长度不同:
series_list = []
for k, v in dict_i.items():
s = pd.Series(data=v,name=k)
series_list.append(s)
pd.concat(series_list,axis=1)
这导致数据帧缺少长度不匹配的值。
Family Item1[online] Family Item2[online]
0 Description of the Item1 (SKU) Description of the Item2 (SKU)
1 Description of the Item1 (SKU) Description of the Item2 (SKU)
2 Description of the Item1 (SKU) Description of the Item2 (SKU)
3 Description of the Item1 (SKU) NaN
4 Description of the Item1 (SKU) NaN