为什么熊猫从文件中读取csv而不从URL中读取csv?

时间:2019-03-01 14:16:03

标签: python pandas csv google-sheets

我正在尝试将csv的Google表格读取到Pandas数据框中。从目录中的文件读取时,我的代码工作正常,但是尝试从url读取会导致KeyError。代码:

import pandas as pd 

sheet='https://docs.google.com/spreadsheets/d/1qGnU-OE4mcVf-Gnc1iINpx2pqH5komEWk1_9shmX6nY/export?format=csv&id=1qGnU-OE4mcVf-Gnc1iINpx2pqH5komEWk1_9shmX6nY'

df1 = pd.read_csv(sheet, engine='python',header=0, delimiter=",", error_bad_lines=False)

to_drop = ['Company Size','Products','SalesRep','BRN potential(y/n)','lat','lon']
df1.drop(to_drop,inplace=True,axis=1) # drop unwanted columns
df = df1.replace(np.nan, '', regex=True) # replace NaN values with empty string

a = []
for x in range(len(df)):
    company     = df.iloc[x,0]
    country     = df.iloc[x,1]
    status      = df.iloc[x,2]
    companyType = df.iloc[x,3]
    address     = df.iloc[x,4]
    url         = df.iloc[x,5]
    email       = df.iloc[x,6]
    phone       = df.iloc[x,7]
    source      = df.iloc[x,8]
    contactedYN = df.iloc[x,9]
    contactDate = df.iloc[x,10]
    notes       = df.iloc[x,11]

    a.append({
        'country':country,
        'company':company, 
        'type'   :companyType, 
        'status' :status, 
        'website':url, 
        'address':address, 
        'phone'  :phone, 
        'email'  :email, 
        'source' :source,
        'contact date' :contactDate,
        'notes'  :notes 
        })

b = pd.DataFrame(a)

b['website'] = b['website'].str.rstrip('/')

print(b.head())

错误消息如下:

Skipping line 1281: ',' expected after '"'
Skipping line 1782: ',' expected after '"'
Skipping line 1878: ',' expected after '"'
Skipping line 1879: ',' expected after '"'
Skipping line 1880: ',' expected after '"'
Skipping line 33: Expected 1 fields in line 33, saw 2
Skipping line 34: Expected 1 fields in line 34, saw 2
...
Traceback (most recent call last):
  File "csv-practive.py", line 14, in <module>
    df1.drop(to_drop,inplace=True,axis=1) # drop unwanted columns
  File "/home/linuxbrew/.linuxbrew/Cellar/python/3.7.2_2/lib/python3.7/site-packages/pandas/core/frame.py", line 3940, in drop
    errors=errors)
  File "/home/linuxbrew/.linuxbrew/Cellar/python/3.7.2_2/lib/python3.7/site-packages/pandas/core/generic.py", line 3780, in drop
    obj = obj._drop_axis(labels, axis, level=level, errors=errors)
  File "/home/linuxbrew/.linuxbrew/Cellar/python/3.7.2_2/lib/python3.7/site-packages/pandas/core/generic.py", line 3812, in _drop_axis
    new_axis = axis.drop(labels, errors=errors)
  File "/home/linuxbrew/.linuxbrew/Cellar/python/3.7.2_2/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 4964, in drop
    '{} not found in axis'.format(labels[mask]))
KeyError: "['Company Size' 'Products' 'SalesRep' 'BRN potential(y/n)' 'lat' 'lon'] not found in axis"

我尝试使用不同的编码器(utf-8 / latin-1),使用请求库首先下载csv,但无法弄清楚为什么pandas从文件而不是url读取得很好。

这是print(df1)的结果:

       <!DOCTYPE html>
0                                      <html lang="tr">
1                                                <head>
2                                <meta charset="utf-8">
3       <meta name="google-site-verification" conten...
4       <title>Google E-Tablolar - ücretsiz olarak w...
5                                               <style>
6                                          @font-face {
7                             font-family: 'Open Sans';
8                                   font-style: normal;
9                                     font-weight: 300;
10                                                    }
11                                         @font-face {
12                            font-family: 'Open Sans';
13                                  font-style: normal;
14                                    font-weight: 400;
15                                                    }
16                                             </style>
17                                              <style>
18                    -webkit-animation-duration: 0.1s;
19                     -webkit-animation-name: fontfix;
20                -webkit-animation-iteration-count: 1;
21           -webkit-animation-timing-function: linear;
22                          -webkit-animation-delay: 0;
23                                                    }
24                         @-webkit-keyframes fontfix {
25                                               from {
26                                          opacity: 1;
27                                                    }
28                                                 to {
29                                          opacity: 1;
...                                                 ...
1616            <script nonce="LhV2p2pyOCXcXw51MT6x1Q">
1617                                       (function(){
1618                  gaia_onLoginSubmit = function() {
1619                                              try {
1620                     gaia.loginAutoRedirect.stop();
1621                                    } catch (err) {
1622        // do not prevent form from being submitted
1623                                                  }
1624                                              try {
1625            document.bg.invoke(function(response) {
1626    document.getElementById('bgresponse').value ...
1627                                                });
1628                                    } catch (err) {
1629    document.getElementById('bgresponse').value ...
1630                                                  }
1631                                       return true;
1632                                                  }
1633    document.getElementById('gaia_loginform').on...
1634                                  var signinButton;
1635    signinButton = document.getElementById('next');
1636                gaia_scrollToElement(signinButton);
1637                                                });
1638                                              })();
1639                                          </script>
1640                                          </script>
1641    <script type="text/javascript" nonce="LhV2p2...
1642    'https:\x2F\x2Faccounts.google.com\x2FPassiv...
1643                                          </script>
1644                                            </body>
1645                                            </html>

1 个答案:

答案 0 :(得分:0)

已解决。我以为我可以独占访问,但没有。链接共享为“关闭”。我只是将其更改为“知道链接的任何人-都可以编辑”。