假设我有一个带有字符串,系列和整数的列的数据框,我想将它们组合成一个新的数据帧,其中String和Integer与系列中的每个条目相结合。我该怎么办呢?
给出这个例子:
data = {'fruits': ['banana', 'apple', 'pear'],
'source' : (['brazil', 'algeria', 'nigera'], ['brazil', 'morocco', 'iran', 'france'], ['china', 'india', 'mexico']),
'prices' : [2, 3, 7]}
df = pd.DataFrame(data, columns = ['fruits', 'source', 'prices'])
我希望获得3x10数据帧;
['banana', 'banana', 'banana', 'apple', 'apple', 'apple', 'apple', 'pear', 'pear', 'pear'],
['brazil', 'algeria', 'nigera', 'brazil', 'morocco', 'iran', 'france', 'china', 'india', 'mexico'],
['2', '2', '2', '3', '3', '3', '3', '7', '7', '7'],
我想它不应该太复杂但我无法找到一个简洁的解决方案。
答案 0 :(得分:7)
使用explode()功能:
In [30]: explode(df, lst_cols='source')
Out[30]:
fruits source prices
0 banana brazil 2
1 banana algeria 2
2 banana nigera 2
3 apple brazil 3
4 apple morocco 3
5 apple iran 3
6 apple france 3
7 pear china 7
8 pear india 7
9 pear mexico 7
答案 1 :(得分:5)
使用stack
和apply(pd.Series)
df.set_index(['fruits','prices']).source.apply(pd.Series).\
stack().reset_index(level=['fruits','prices']).\
rename(columns={0:'source'})
Out[64]:
fruits prices source
0 banana 2 brazil
1 banana 2 algeria
2 banana 2 nigera
0 apple 3 brazil
1 apple 3 morocco
2 apple 3 iran
3 apple 3 france
0 pear 7 china
1 pear 7 india
2 pear 7 mexico
Op2重新创建你的df
df1=df[['fruits','prices']].reindex(df.index.repeat(df.source.apply(len)))
df1['source']=np.concatenate(df.source.values)
df1
Out[69]:
fruits prices source
0 banana 2 brazil
0 banana 2 algeria
0 banana 2 nigera
1 apple 3 brazil
1 apple 3 morocco
1 apple 3 iran
1 apple 3 france
2 pear 7 china
2 pear 7 india
2 pear 7 mexico
答案 2 :(得分:5)
我使用concat
+ melt
拍摄此照片。
c = ['fruits', 'prices']
df = (pd.concat([pd.DataFrame(df.source.tolist()), df[c]], 1)
.melt(c, value_name='source')
.drop('variable', 1)
.dropna())
df
fruits prices source
0 banana 2 brazil
1 apple 3 brazil
2 pear 7 china
3 banana 2 algeria
4 apple 3 morocco
5 pear 7 india
6 banana 2 nigera
7 apple 3 iran
8 pear 7 mexico
10 apple 3 france