以下是从pandas dataframe中的kaggle主页导入的任务列表。
$(".manage_permission_button").click(function() {
var button = $(this),
user_id = button.attr('user_id');
$.ajax({
type: 'GET',
url: 'assets/pages/manage_users/myModal.php',
data: {
user_id: user_id
},
success: function(response) {
$(response).insertAfter(button);
$('#myModal').modal('show');
},
});
});
$('#myModal').on('hidden.bs.modal', function () {
$(this).remove();
});
示例数据框将正确生成第一行。我需要循环其余的数据。 如何每5行重复一次转置方法?
答案 0 :(得分:2)
最简单的是使用MultiIndex
,但不幸的是,每5行数据不重复:
df.index = [df.index // 5, df.index % 5]
sample = df.unstack()
sample.columns=['task_name', 'task_description', 'task_date', 'task_prize', 'task_teams']
print (sample.head(10))
task_description \
0 Can you detect fraudulent click traffic for mo...
1 Can you segment each objects within image fram...
2 Image classification of fashion products.
3 Image Classification of Furniture & Home Goods.
4 Given an image, can you find all of the same l...
5 Google Landmark Recognition Challenge
6 289 teams
7 Knowledge
8 image data, object detection
9 Getting Started2 years to go
task_date \
0 Featured13 days to go
1 Research2 months to go
2 Researcha month to go
3 Researcha month to go
4 Researcha month to go
5 Label famous (and not-so-famous) landmarks in ...
6 ImageNet Object Detection Challenge
7 0 teams
8 Knowledge
9 tutorial, tabular data, binary classification
task_prize \
0 $25,000
1 $2,500
2 $2,500
3 $2,500
4 image data
5 Researcha month to go
6 Identify and label everyday objects in images
7 ImageNet Object Localization Challenge
8 7 teams
9 Knowledge
task_teams
0 3,382 teams
1 32 teams
2 67 teams
3 238 teams
4 $2,500
5 image data
6 Research12 years to go
7 Identify the objects in images
8 Titanic: Machine Learning from Disaster
9 11,169 teams
答案 1 :(得分:1)
要清理它并加载到数据框中,您可以执行以下操作:
import requests as r
import pandas as pd
raw = r.get('https://s3.amazonaws.com/todel162/kaggle_unicode1.txt')
# the raw data has some non ascii characters which you could likely ignore.
# and I ignore the last line if it is blank as that breaks the parsing.
data = raw.text.encode('ascii', errors='ignore').decode()
lines = [d.strip() for d in data.split('\n')]
if lines[-1] == '':
lines = lines[:-1]
# then split out sections of data
# this 1 lines replaces the following commented out for-loop more elegantly
blurbs = [l.split('**') for l in '**'.join(lines).split('****')]
# blurbs = []
# blurb = []
# for line in lines:
# if line == '':
# blurbs.append(blurb)
# blurb = []
# else:
# blurb.append(line)
# it seems each section can either have 5 or 6 elements, write a function to return a uniform format record, and use pandas.DataFrame.from_records to load into dataframe
def get_record(blurb):
if len(blurb) == 6:
return blurb
return blurb[:3] + [''] + blurb[3:]
cols = ['task_name', 'task_description', 'task_date', 'other', 'task_prize', 'task_teams']
df = pd.DataFrame.from_records([get_record(b) for b in blurbs], columns=cols)
df.head()
这输出以下内容:
Out[8]:
task_name \
0 TalkingData AdTracking Fraud Detection Challenge
1 CVPR 2018 WAD Video Segmentation Challenge
2 iMaterialist Challenge (Fashion) at FGVC5
3 iMaterialist Challenge (Furniture) at FGVC5
4 Google Landmark Retrieval Challenge
task_description task_date \
0 Can you detect fraudulent click traffic for mo... Featured13 days to go
1 Can you segment each objects within image fram... Research2 months to go
2 Image classification of fashion products. Researcha month to go
3 Image Classification of Furniture & Home Goods. Researcha month to go
4 Given an image, can you find all of the same l... Researcha month to go
other task_prize task_teams
0 $25,000 3,382 teams
1 $2,500 32 teams
2 $2,500 67 teams
3 $2,500 238 teams
4 image data $2,500 129 teams
如您所见,数据正在被正确地解析为列。从那里,您可以转换类型,删除列other
等,并分析数据集。