当我参加预测建模练习时,我无法理解旗帜的使用。我用Google搜索了,但我无法找到最佳解释。
train = pd.read_csv('C:/Users/Analytics Vidhya/Desktop/challenge/Train.csv')
test = pd.read_csv('C:/Users/Analytics Vidhya/Desktop/challenge/Test.csv')
train['Type'] = 'Train' #Create a flag for Train and Test Data set
test['Type'] = 'Test'
fullData = pd.concat([train,test], axis=0) #Combined both Train and Test Data set
你能解释一下标志在Python pandas中意味着什么,以及标志的重要性。谢谢。
答案 0 :(得分:2)
我想将它作为一个例子展示会更容易,更快:
In [102]: train = pd.DataFrame(np.random.randint(0, 5, (5, 3)), columns=list('abc'))
In [103]: test = pd.DataFrame(np.random.randint(0, 5, (3, 3)), columns=list('abc'))
In [104]: train
Out[104]:
a b c
0 3 4 0
1 0 0 1
2 2 4 1
3 4 2 0
4 2 4 0
In [105]: test
Out[105]:
a b c
0 1 0 3
1 3 3 0
2 4 4 3
让我们为每个DF添加Type
列:
In [106]: train['Type'] = 'Train'
In [107]: test['Type'] = 'Test'
现在让我们加入/合并(垂直)两个DF - Type
列将有助于区分两个不同DF的数据:
In [108]: fullData = pd.concat([train,test], axis=0)
In [109]: fullData
Out[109]:
a b c Type
0 3 4 0 Train
1 0 0 1 Train
2 2 4 1 Train
3 4 2 0 Train
4 2 4 0 Train
0 1 0 3 Test
1 3 3 0 Test
2 4 4 3 Test