在python 3 panda

时间:2017-12-13 02:46:32

标签: python pandas csv dataframe

我有CSV文件

 Firstname   Lastname     City     Province
'Guy',       'Ouell',   'Brossard','QC'
'Michelle',  'Balonne','Stittsville','ON'
'Ben',       'Sluzing','Toronto','ON'
'Theodora', 'Panapoulos','Saint-Constant','QC'
'Kathleen', 'Mercier','St Johns','NL'
...

我打开并检查一切都很好:

 df = pd.read_csv('a.csv')
 df.head(n=5)

当我想使用列时,我有两个不同的问题:

问题1:只有我可以访问第一列,当我想使用其他列时,我收到错误:

for mis_column, mis_row in missing_df.iterrows():
    print(mis_row['Firstname'])

我得到了所有的名字,但是当我想要获得所有城市时,例如,我看到:

TypeError                                 Traceback (most recent call last)
E:\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_value(self, series, key)
   2482             try:
-> 2483                 return libts.get_value_box(s, key)
   2484             except IndexError:

pandas/_libs/tslib.pyx in pandas._libs.tslib.get_value_box 
(pandas\_libs\tslib.c:18843)()

pandas/_libs/tslib.pyx in pandas._libs.tslib.get_value_box 
(pandas\_libs\tslib.c:18477)()

TypeError: 'str' object cannot be interpreted as an integer

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-36-55ba81245685> in <module>()
       1 
       2 for mis_column, mis_row in missing_df.iterrows():
 ----> 3     print(mis_row['City'])
       4 
       5 

  E:\Anaconda3\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
      599         key = com._apply_if_callable(key, self)
      600         try:
  --> 601             result = self.index.get_value(self, key)
      602 
      603             if not is_scalar(result):

  E:\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in 
  get_value(self, series, key)
     2489                     raise InvalidIndexError(key)
     2490                 else:
  -> 2491                     raise e1
     2492             except Exception:  # pragma: no cover
     2493                 raise e1

  E:\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_value(self, series, key)
     2475         try:
     2476             return self._engine.get_value(s, k,
  -> 2477        tz=getattr(series.dtype, 'tz', None))
     2478         except KeyError as e1:
     2479             if len(self) > 0 and self.inferred_type in ['integer', 'boolean']:

  pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()

  pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()

  pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

  pandas\_libs\hashtable_class_helper.pxi in 
  pandas._libs.hashtable.PyObjectHashTable.get_item()

  pandas\_libs\hashtable_class_helper.pxi in 
  pandas._libs.hashtable.PyObjectHashTable.get_item()

  KeyError: 'City'

问题2:

 for mis_column, mis_row in df.iterrows():
     if mis_row['Firstname'] == 'Guy': 
            print('A')

不打印A

提前致谢

1 个答案:

答案 0 :(得分:0)

将CSV标题的逗号分开。像这样,

 Firstname,   Lastname,     City,     Province
'Guy',       'Ouell',   'Brossard','QC'
'Michelle',  'Balonne','Stittsville','ON'
'Ben',       'Sluzing','Toronto','ON'
'Theodora', 'Panapoulos','Saint-Constant','QC'
'Kathleen', 'Mercier','St John's','NL'

由于您的CSV周围有空格,因此您可以通过跳过来读取数据框

df = pd.read_csv('<your_input>.csv', skipinitialspace=True)

如果你想删除单引号,那么,

df = pd.read_csv('<your_input>.csv', skipinitialspace=True, quotechar="'")

>>> df
  Firstname    Lastname            City Province
0       Guy       Ouell        Brossard       QC
1  Michelle     Balonne     Stittsville       ON
2       Ben     Sluzing         Toronto       ON
3  Theodora  Panapoulos  Saint-Constant       QC
4  Kathleen     Mercier       St Johns'       NL


>>> import pandas as pd
>>> df = pd.read_csv('test2.csv', skipinitialspace=True, quotechar="'")
>>> df
  Firstname    Lastname            City Province
0       Guy       Ouell        Brossard       QC
1  Michelle     Balonne     Stittsville       ON
2       Ben     Sluzing         Toronto       ON
3  Theodora  Panapoulos  Saint-Constant       QC
4  Kathleen     Mercier       St Johns'       NL
>>> for mis_column, mis_row in df.iterrows():
...      if mis_row['Firstname'] == 'Guy':
...             print('A')
...
A
>>>