如何从Python元组中提取数据?

时间:2016-05-30 22:24:35

标签: python csv numpy scikit-learn

我是Python的新手。对于这些数据,我正在使用Jupytier iPython。我试图从csv文件中提取数值数据,然后运行Sklearn。我有:

使用Pandas打开并阅读CSV文件

将数据设置为我设置为organza my data

的字典

(设置Pandas DataFrame以查看我的代码和值)

在np数组中设置我的数据,使其可供机器学习

将我的数据重新整形为2D数组,以便进一步获取机器学习

我尝试将数据设置为目标属性,但我意识到我的带有pandas的Lat_reader是一个元组。在我可以执行任何类型的ML之前,我必须从我的元组中提取数据。基本上,我的Lat-reader现在是一个元组,而Sklearns无法使用元组。我想要使​​用元组内部的数据,但我想从元组中更改它。

这是我的代码:

import numpy as np
import pandas as pd


codes = {'Generation':{1:'First', 2: 'Second'},
     'Location':{1:'New York', 2:'Pennsylvania', 3: 'New Jersey'},
     'Age at Last Birthday':{},
     'Gender':{1:'Male',  2:'Female'},
     'Country':{1: 'Arg', 2:'Bol', 3:'Bra', 4:'Col', 5:'DR', 6:'Edu', 7:'El      Sal', 8:'Gua', 9:'Hon', 10:'Mex',
                11:'Nic', 12:'Pan', 13:'Peru', 14:'PR', 15:'Ven'},
     'African Roots':{0:'No', 1:'Yes'},
     'European Roots':{0:'No', 1:'Yes'},
     'Indian Roots':{0:'No', 1:'Yes'},
     'Other Roots':{0:'No', 1:'Yes'},
     'Skin Color':{1:'Light',2:'Medium Light',3:'Medium',4:'Mediium Dark',5:'Dark'},
     'Legal Status':{1:'Documents',2:'No Documents',3:'Questionable Documents',9:'Missing'},
     'Reason for Migration':{1:'supply-side economics',2:'demand-side economics',3:'network links',
                             4:'violence at origin', 5:'family reasons',6:'other'},
     'Return Plans':{1:'Yes', 2:'No', 3:'Dont Know', 9:'Not Asked'},
     'Occupation':{1:'Unpaid',2:'Student',3:'Agrigulture', 4:'Unskilled Operative',
                   5:'Skilled Operative', 6:'Transport Worker',7:'Unsilled Services',
                   8:'Skilled Services',9:'Small Business',10:'Professional',11:'Retired',99:'Unknown'},
     'Wage in US Dollars':{88:'Not Appliable',99:'Unknown'},
     'Hours Worked':{ 88:'Not Appliable', 99:'Unknown'},
     'Identity':{1:'Latino', 2:'American', 3:'Both',99:'Unknown'},
     'Latino Identity Among Immigrants':{1:'Yes', 2:'No', 3:'Yes-No', 4: 'Dont Know', 9:'Missing'},
     'Reasons for Latino Identity':{1:'Yes',2:'No', 9:'Unknown'},
     'With Whom Gets Together':{1:'Yes', 2:'No', 9:'Unknown'},
     'USYrs':{88:'Not Appliable',99:'Mising'},
     'In Contact With Home':{1:'Yes',2:'No',9:'Unknown'},
     'R Send Money Home':{1:'Yes',2:'No',3:'Send Other', 9:'Unknown'},
     'Parent Send Money':{1:'Yes',2:'No',3:'Not Appliable',4:'Unknown'},
     'Quantity Sent by Respondent or Parent':{1:'Half of Paycheck',2:'20% of paycheck',3:'Varies month to month'},
     'How Money Sent':{1:'Moneygram',2:'Paisano',3:'Friend',4:'Self',5:'Bank', 6:'Moneygram and Paisano',
                      7:'Moneygram and Friend'},
     'Frequency Money Sent':{1:'Once a month', 2:'Twice a year',3:'Once a year',4:'Once in a while',5:'Holidays'},
     'How Money Used':{0:'No Use',1:'Buy House',2:'Family Expenses',3:'Health',4:'Education',5:'Savings',6:'Pay a Debt'},
     'Bank in US':{1:'Yes',2:'No',3:'Unknown'},
     'Bank Overseas':{1:'Yes',2:'No',3:'Unknown'},
     'Type of Communication':{1:'Land Phone',2:'Cell Phone',3:'Calling Card',4:'Email',5:'Regular Mail',
                              6:'No Communication', 9:'Unknown'},
     'Presents Sent':{1:'Yes',2:'No', 3:'Unknown'},
     'Education':{},
     'EngAbli':{0:'None',1:'Some English',2:'Good English',9:'Missing'},
     'EconOpp':{1: 'More in the US',2:'More at Origin',3:'Same at Both', 9:'Missing'},
     'OthOpps':{0:'Just Earnings',1:'Personal',2:'Work',3:'Study'},
     'Inequality':{1:'More at Origin',2:'More in the US',3:'Same in Both', 9:'Missing'},
     'Discrim':{1:'Yes',0:'No',9:'Missing'},
     'Context':{1:'Work/School',2:'On The Street',3:'Language',4:'Race/Ethnicity',5:'Medical',6:'Violence',
               7:'Poverty',8:'Other',9:'Missing'}}

pd.DataFrame(codes.items(), columns =['Codes', 'Values'])



Lat_pro =  open('Identity.Codes.Datafile.csv')


Lat_reader = (pd.read_csv(Lat_pro), ',' )

np.array(Lat_reader)


newLat_reader = np.reshape(A, (202,73))


print newLat_reader

以下是输出示例:

                                        Unnamed: 0 Unnamed: 1  \
0                                        Subject Code        Gen   
1                                               F-001          1   
2                                               F-002          1   
3                                               F-003          1   
4                                               F-007          1   
5                                               F-008          1   
6                                               F-010          1   
7                                               F-013          1   
8                                               F-014          1   
9                                               F-015          1   
10                                              F-016          1   
11                                              F-017          1   
12                                              F-018          1   
13                                              F-019          1   
14                                              F-020          1   
15                                              F-021          1   
16                                              F-022          1   
17                                              F-024          1   
18                                              F-025          1   
19                                              F-026          1   
20                                              F-027          1   
21                                              F-028          1   
22                                              F-032          1   
23                                              F-033          1   
24                                              F-034          1   
25                                              F-035          1   
26                                              F-036          1   
27                                              F-037          1   
28                                              F-038          1   
29                                              F-039          1   
..                                                ...        ...   
172                                      Legal Status        NaN   
173                              Reason for Migration        NaN   
174                                      Return Plans        NaN   
175                                        Occupation        NaN   
176                                               NaN        NaN   
177                                              Wage        NaN   
178                                      Hours Worked        NaN   
179                                          Identity        NaN   
180                  Latino Identity Among Immigrants        NaN   
181                       Reasons for Latino Identity        NaN   
182                           With Whom Gets Together        NaN   
183                                             USYrs        NaN   
184                    In Contact with Home Community        NaN   
185                                R Sends Money Home        NaN   
186  Parent Sends Money Home (Second Generation Only)        NaN   
187             Quantity Sent by Respondent or Parent        NaN   
188                                    How Money Sent        NaN   
189                              Frequency Money Sent        NaN   
190                                    How Money Used        NaN   
191                                        Bank in US        NaN   
192                                     Bank Overseas        NaN   
193                             Type of Communication        NaN   
194                                    Presents Sent         NaN   
195                                         Education        NaN   
196                                           EngAbil        NaN   
197                                          EconOpps        NaN   
198                                           OthOpps        NaN   
199                                        Inequality        NaN   
200                                           Discrim        NaN   
201                                           Context        NaN   

                                        Unnamed: 2 Unnamed: 3 Unnamed: 4  \
0                                                Place        Age       Male   
1                                                    1         28          1   
2                                                    2         35          1   
3                                                    1         30          0   
4                                                    3         19          1   
5                                                    3         20          1   
6                                                    2         21          0   
7                                                    3         29          1   
8                                                    1         25          1   
9                                                    3         23          1   
10                                                   3         30          0   
11                                                   3         21          0   
12                                                   3         23          1   
13                                                   3         34          1   
14                                                   3         33          1   
15                                                   3         33          0   
16                                                   3         33          1   
17                                                   3         26          1   
18                                                   2         31          1   
19                                                   3         31          0   
20                                                   3         20          1   
21                                                   1         20          0   
22                                                   3         22          1   
23                                                   1         20          1   
24                                                   3         30          0   
25                                                   3         22          1   
26                                                   3         26          0   
27                                                   3         25          1   
28                                                   1         19          0   
29                                                   3         21          1   
..                                                 ...        ...        ...   
172  1=Documents   2=No Documents   3=Questionable ...        NaN        NaN   
173  1=supply-side economics   2=demand-side econom...        NaN        NaN   
174  1=Yes   2=No   3=Don't Know   4=No Answer   9=...        NaN        NaN   
175  1=Unpaid  2=Student  3=Agrigulture  4=Unskille...        NaN        NaN   
176  7=Unsilled Services  8=Skilled Services  9=Sma...        NaN        NaN   
177  Wage in U.S. Dollars;  88=Not applicable;  99=...        NaN        NaN   
178       Hours Worked; 88=Not Applicable;  99=Unknown        NaN        NaN   
179           1=Latino  2=American  3=Both   9=Unknown        NaN        NaN   
180     1=Yes  2=No  3=Yes-No  4=Don't Know  9=Missing        NaN        NaN   
181                           1=Yes   0=No   9=Unknown        NaN        NaN   
182                           1=Yes   0=No   9=Unknown        NaN        NaN   
183  Number of Years in US; 88=Not Applicable; 99 M...        NaN        NaN   
184                           1=Yes   0=No  9=Unknown         NaN        NaN   
185             1=Yes   2=No   3=Send Other  9=Unknown        NaN        NaN   
186         1=Yes   2=No   8=Not Applicable  9=Unknown        NaN        NaN   
187  1=Half of Paycheck  2=20% of Paycheck  3=Varie...        NaN        NaN   
188  1=Moneygram   2=Paisano   3=Friend   4=Self   ...        NaN        NaN   
189  1=Once a Month   2=Twice a Year   3=Once a Yea...        NaN        NaN   
190  0=No Use  1=Buy House   2=Family Expenses   3=...        NaN        NaN   
191                        1=Yes   2=No   9=Unknown           NaN        NaN   
192                           1=Yes   2=No   9=Unknown        NaN        NaN   
193  1=Land Phone   2=Cell Phone   3=Calling Card  ...        NaN        NaN   
194                           1=Yes   2=No   9=Unknown        NaN        NaN   
195                                           In Years        NaN        NaN   
196  0=None  1=Some English   2=Good English   9=Mi...        NaN        NaN   
197  1=More in US   2=More at Origin  3=Same at Bot...        NaN        NaN   
198  0=Just Earnings  1=Personal  2=Work   3=Study ...        NaN        NaN   
199  1=More at Origin   2=More in US  3=Same in Bot...        NaN        NaN   
200                            1=Yes   0=No  9=Missing        NaN        NaN   
201  1=Work/School   2=On Street  3=Language  4=Rac...        NaN        NaN   

1 个答案:

答案 0 :(得分:1)

如果你的意思是基本的python语法,你应该先去那里。

https://docs.python.org/3/tutorial/datastructures.html#tuples-and-sequences

以下是一些例子:

data = ('first', 'second', 'third', 'fourth')    
print "count is ", len(data)
print "accessing by array notation: ", data[0]
print "for loop:"
for x in data:
    print x

要将元组转换为列表(因为你想要一个元组以外的东西,我想也许一个列表可以工作)

print [x for x in data]