我有一个这样的DataFrame:
df = pd.DataFrame({'number': [['233182801104', '862824274124', '278711320172'], ['072287346459', '278058853506'], ['233182801104', '862824274124'], None, ['123412341234']], 'country':[None, 'France', 'USA', None, 'Germany'], 'c':np.random.randn(5), 'd':np.random.randn(5)})
外观如下:
number country c d
0 [233182801104, 862824274124, 278711320172] None 0.177375 -0.226086
1 [072287346459, 278058853506] France -0.134511 0.551962
2 [233182801104, 862824274124] USA 0.490095 0.770992
3 None None -0.714745 0.807898
4 [123412341234] Germany 1.047809 0.523591
我想要数字列和国家/地区列表中元素的所有唯一组合。另一个问题是列表的长度和数量可能非常多,国家/地区可能包含None
:
code country_final
233182801104 USA
862824274124 USA
278711320172 None
072287346459 France
278058853506 France
123412341234 Germany
第一步,我要拥有单独的列
a['number'].apply(pd.Series)
之后,我不确定是否必须使用groupby
或某种数据透视表。
答案 0 :(得分:0)
尝试一下
data = []
for i in df.itertuples():
for j in i[1]:
data.append( (j,i[2]) )
df2 = pd.DataFrame( data, columns =['code' , 'country_final']
或者您可以将其压缩为:
df2 = pd.DataFrame( [ (j,i[2]) for i in df.itertuples() for j in i[1] ], columns =['code' , 'country_final']
答案 1 :(得分:0)
我将unnesting
与groupby
+ first
一起使用
s=unnesting(df.dropna(subset=['number']),['number'])
s=s.mask(s.isnull()).groupby('number').country.first().sort_values().reset_index()
s
number country
0 072287346459 France
1 278058853506 France
2 123412341234 Germany
3 233182801104 USA
4 862824274124 USA
5 278711320172 NaN
def unnesting(df, explode):
idx = df.index.repeat(df[explode[0]].str.len())
df1 = pd.concat([
pd.DataFrame({x: np.concatenate(df[x].values)}) for x in explode], axis=1)
df1.index = idx
return df1.join(df.drop(explode, 1), how='left')