我正在尝试将csv的Google表格读取到Pandas数据框中。从目录中的文件读取时,我的代码工作正常,但是尝试从url读取会导致KeyError。代码:
import pandas as pd
sheet='https://docs.google.com/spreadsheets/d/1qGnU-OE4mcVf-Gnc1iINpx2pqH5komEWk1_9shmX6nY/export?format=csv&id=1qGnU-OE4mcVf-Gnc1iINpx2pqH5komEWk1_9shmX6nY'
df1 = pd.read_csv(sheet, engine='python',header=0, delimiter=",", error_bad_lines=False)
to_drop = ['Company Size','Products','SalesRep','BRN potential(y/n)','lat','lon']
df1.drop(to_drop,inplace=True,axis=1) # drop unwanted columns
df = df1.replace(np.nan, '', regex=True) # replace NaN values with empty string
a = []
for x in range(len(df)):
company = df.iloc[x,0]
country = df.iloc[x,1]
status = df.iloc[x,2]
companyType = df.iloc[x,3]
address = df.iloc[x,4]
url = df.iloc[x,5]
email = df.iloc[x,6]
phone = df.iloc[x,7]
source = df.iloc[x,8]
contactedYN = df.iloc[x,9]
contactDate = df.iloc[x,10]
notes = df.iloc[x,11]
a.append({
'country':country,
'company':company,
'type' :companyType,
'status' :status,
'website':url,
'address':address,
'phone' :phone,
'email' :email,
'source' :source,
'contact date' :contactDate,
'notes' :notes
})
b = pd.DataFrame(a)
b['website'] = b['website'].str.rstrip('/')
print(b.head())
错误消息如下:
Skipping line 1281: ',' expected after '"'
Skipping line 1782: ',' expected after '"'
Skipping line 1878: ',' expected after '"'
Skipping line 1879: ',' expected after '"'
Skipping line 1880: ',' expected after '"'
Skipping line 33: Expected 1 fields in line 33, saw 2
Skipping line 34: Expected 1 fields in line 34, saw 2
...
Traceback (most recent call last):
File "csv-practive.py", line 14, in <module>
df1.drop(to_drop,inplace=True,axis=1) # drop unwanted columns
File "/home/linuxbrew/.linuxbrew/Cellar/python/3.7.2_2/lib/python3.7/site-packages/pandas/core/frame.py", line 3940, in drop
errors=errors)
File "/home/linuxbrew/.linuxbrew/Cellar/python/3.7.2_2/lib/python3.7/site-packages/pandas/core/generic.py", line 3780, in drop
obj = obj._drop_axis(labels, axis, level=level, errors=errors)
File "/home/linuxbrew/.linuxbrew/Cellar/python/3.7.2_2/lib/python3.7/site-packages/pandas/core/generic.py", line 3812, in _drop_axis
new_axis = axis.drop(labels, errors=errors)
File "/home/linuxbrew/.linuxbrew/Cellar/python/3.7.2_2/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 4964, in drop
'{} not found in axis'.format(labels[mask]))
KeyError: "['Company Size' 'Products' 'SalesRep' 'BRN potential(y/n)' 'lat' 'lon'] not found in axis"
我尝试使用不同的编码器(utf-8 / latin-1),使用请求库首先下载csv,但无法弄清楚为什么pandas从文件而不是url读取得很好。
这是print(df1)的结果:
<!DOCTYPE html>
0 <html lang="tr">
1 <head>
2 <meta charset="utf-8">
3 <meta name="google-site-verification" conten...
4 <title>Google E-Tablolar - ücretsiz olarak w...
5 <style>
6 @font-face {
7 font-family: 'Open Sans';
8 font-style: normal;
9 font-weight: 300;
10 }
11 @font-face {
12 font-family: 'Open Sans';
13 font-style: normal;
14 font-weight: 400;
15 }
16 </style>
17 <style>
18 -webkit-animation-duration: 0.1s;
19 -webkit-animation-name: fontfix;
20 -webkit-animation-iteration-count: 1;
21 -webkit-animation-timing-function: linear;
22 -webkit-animation-delay: 0;
23 }
24 @-webkit-keyframes fontfix {
25 from {
26 opacity: 1;
27 }
28 to {
29 opacity: 1;
... ...
1616 <script nonce="LhV2p2pyOCXcXw51MT6x1Q">
1617 (function(){
1618 gaia_onLoginSubmit = function() {
1619 try {
1620 gaia.loginAutoRedirect.stop();
1621 } catch (err) {
1622 // do not prevent form from being submitted
1623 }
1624 try {
1625 document.bg.invoke(function(response) {
1626 document.getElementById('bgresponse').value ...
1627 });
1628 } catch (err) {
1629 document.getElementById('bgresponse').value ...
1630 }
1631 return true;
1632 }
1633 document.getElementById('gaia_loginform').on...
1634 var signinButton;
1635 signinButton = document.getElementById('next');
1636 gaia_scrollToElement(signinButton);
1637 });
1638 })();
1639 </script>
1640 </script>
1641 <script type="text/javascript" nonce="LhV2p2...
1642 'https:\x2F\x2Faccounts.google.com\x2FPassiv...
1643 </script>
1644 </body>
1645 </html>
答案 0 :(得分:0)
已解决。我以为我可以独占访问,但没有。链接共享为“关闭”。我只是将其更改为“知道链接的任何人-都可以编辑”。