我一直在尝试从具有以下结构的字典创建一个DataFrame。
imgAlbmShrDict = {('10', 'photo_album_57'): 20.0,
('10', 'photo_album_8'): 20.0,
('1061', 'photo_album_29'): 100.0,
('1061', 'photo_album_90'): 90.0,
('1102', 'photo_album_29'): 80.0,
('1102', 'photo_album_90'): 60.0,
('1300', 'photo_album_15'): 100.0,
('1300', 'photo_album_89'): 60.0,
('1301', 'photo_album_15'): 88.88888888888889,
('1301', 'photo_album_89'): 60.0
}
pd.DataFrame(imgAlbmShrDict,index=['Proportion']).transpose()
Proportion
10 photo_album_57 20.000000
photo_album_8 20.000000
1061 photo_album_29 100.000000
photo_album_90 90.000000
1102 photo_album_29 80.000000
photo_album_90 60.000000
1300 photo_album_15 100.000000
photo_album_89 60.000000
1301 photo_album_15 88.888889
photo_album_89 60.000000
输出正是我需要的,但我无法仅从数据框中提取前两列。 第一列实际上是图像ID,第二列是出现图像的相册。
我需要帮助来访问列和添加列的方法,同时保留结构。
必需的输出:
Proportion URL
10 photo_album_57 20.000000 www.something.com/10.jpeg
photo_album_8 20.000000
1061 photo_album_29 100.000000 www.something.com/1061.jpeg
photo_album_90 90.000000
1102 photo_album_29 80.000000 www.something.com/1102.jpeg
photo_album_90 60.000000
1300 photo_album_15 100.000000 www.something.com/1300.jpeg
photo_album_89 60.000000
1301 photo_album_15 88.888889 www.something.com/1301.jpeg
photo_album_89 60.000000
答案 0 :(得分:2)
您可以使用get_level_values
,因为前两列是Multiindex
:
print (df.index.get_level_values(0))
Index(['10', '10', '1061', '1061', '1102', '1102', '1300', '1300', '1301',
'1301'],
dtype='object')
df['URL'] = 'www.something.com/' + df.index.get_level_values(0) + '.jpg'
print (df)
Proportion URL
10 photo_album_57 20.000000 www.something.com/10.jpg
photo_album_8 20.000000 www.something.com/10.jpg
1061 photo_album_29 100.000000 www.something.com/1061.jpg
photo_album_90 90.000000 www.something.com/1061.jpg
1102 photo_album_29 80.000000 www.something.com/1102.jpg
photo_album_90 60.000000 www.something.com/1102.jpg
1300 photo_album_15 100.000000 www.something.com/1300.jpg
photo_album_89 60.000000 www.something.com/1300.jpg
1301 photo_album_15 88.888889 www.something.com/1301.jpg
photo_album_89 60.000000 www.something.com/1301.jpg
也许需要drop_duplicates
:
df = df.drop_duplicates(subset='URL')
print (df)
Proportion URL
10 photo_album_57 20.000000 www.something.com/10.jpg
1061 photo_album_29 100.000000 www.something.com/1061.jpg
1102 photo_album_29 80.000000 www.something.com/1102.jpg
1300 photo_album_15 100.000000 www.something.com/1300.jpg
1301 photo_album_15 88.888889 www.something.com/1301.jpg
使用reset_index
和设置列名称的另一个解决方案:
df.reset_index(inplace=True)
df.columns = ['ID','Album','Proportion']
df['URL'] = 'www.something.com/' + df['ID'] + '.jpg'
print (df)
ID Album Proportion URL
0 10 photo_album_57 20.000000 www.something.com/10.jpg
1 10 photo_album_8 20.000000 www.something.com/10.jpg
2 1061 photo_album_29 100.000000 www.something.com/1061.jpg
3 1061 photo_album_90 90.000000 www.something.com/1061.jpg
4 1102 photo_album_29 80.000000 www.something.com/1102.jpg
5 1102 photo_album_90 60.000000 www.something.com/1102.jpg
6 1300 photo_album_15 100.000000 www.something.com/1300.jpg
7 1300 photo_album_89 60.000000 www.something.com/1300.jpg
8 1301 photo_album_15 88.888889 www.something.com/1301.jpg
9 1301 photo_album_89 60.000000 www.something.com/1301.jpg
EDIT1:
感谢stephen
寻求解决方案。
我尝试boolean indexing
使用Index.duplicated
:
mask = ~df.index.get_level_values(0).duplicated()
print (mask)
[ True False True False True False True False True False]
subindex = df.index[mask]
df.loc[subindex, 'URL'] = 'www.something.com/' + subindex.get_level_values(0) + '.jpg'
df.URL.fillna('', inplace=True)
print (df)
Proportion URL
10 photo_album_57 20.000000 www.something.com/10.jpg
photo_album_8 20.000000
1061 photo_album_29 100.000000 www.something.com/1061.jpg
photo_album_90 90.000000
1102 photo_album_29 80.000000 www.something.com/1102.jpg
photo_album_90 60.000000
1300 photo_album_15 100.000000 www.something.com/1300.jpg
photo_album_89 60.000000
1301 photo_album_15 88.888889 www.something.com/1301.jpg
photo_album_89 60.000000