我已将csv导入为多索引的Dataframe。这是数据的模型:
df = pd.read_csv("coursedata2.csv", index_col=[0,2])
print (df)
COURSE
ID Course List
12345 Interior Environments DESN10000
Rendering & Present Skills DESN20065
Lighting DESN20025
22345 Drawing Techniques DESN10016
Colour Theory DESN14049
Finishes & Sustainable Issues DESN12758
Lighting DESN20025
32345 Window Treatments&Soft Furnish DESN27370
42345 Introduction to CADD INFO16859
Principles of Drafting DESN10065
Drawing Techniques DESN10016
The Fundamentals of Design DESN15436
Colour Theory DESN14049
Interior Environments DESN10000
Drafting DESN10123
Textiles and Applications DESN10199
Finishes & Sustainable Issues DESN12758
[17 rows x 1 columns]
我可以使用.xs轻松地通过标签对其进行切片 - 例如:
selected = df.xs (12345, level='ID')
print selected
COURSE
Course List
Interior Environments DESN10000
Rendering & Present Skills DESN20065
Lighting DESN20025
[3 rows x 1 columns]
>
但我想要做的是逐步完成数据框并按ID对每个课程块执行操作。实际数据中的ID值是相当随机的整数,按升序排序。
df.index显示:
df.index
MultiIndex(levels=[[12345, 22345, 32345, 42345], [u'Colour Theory', u'Colour Theory ', u'Drafting', u'Drawing Techniques', u'Finishes & Sustainable Issues', u'Interior Environments', u'Introduction to CADD', u'Lighting', u'Principles of Drafting', u'Rendering & Present Skills', u'Textiles and Applications', u'The Fundamentals of Design', u'Window Treatments&Soft Furnish']],
labels=[[0, 0, 0, 1, 1, 1, 1, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3], [5, 9, 7, 3, 1, 4, 7, 12, 6, 8, 3, 11, 0, 5, 2, 10, 4]],
names=[u'ID', u'Course List'])
在我看来,我应该能够使用第一个索引标签来增加数据帧。 IE浏览器。获得标签0然后1然后2然后3的所有课程......但看起来.xs不会按标签切片。
我错过了什么吗?
答案 0 :(得分:0)
因此,可能有更有效的方法来执行此操作,具体取决于您尝试对数据执行的操作。但是,有两种方法可以立即浮现在脑海中:
for id_label in df.index.levels[0]:
some_func(df.xs(id_label, level='ID'))
和
for id_label in df.index.levels[0]:
df.xs(id_label, level='ID').apply(some_func, axis=1)
取决于您是要对整个群组进行操作还是对其中的每一行进行操作。