Question

我正在尝试通过以下网址将excel文件读入Pandas：

$newArray = [];
foreach($originalArray as $item) {
    $key = explode('/', $item['cats'])[1];
    $newArray[$key][] = $item;
}

使用代码：

url1 = 'https://cib.societegenerale.com/fileadmin/indices_feeds/CTA_Historical.xls'

url2 = 'https://cib.societegenerale.com/fileadmin/indices_feeds/STTI_Historical.xls'

然而，它不起作用，我收到错误：

pd.read_excel(url1)

在Google上搜索后，似乎有时通过网址提供的.xls文件实际上是以不同的文件格式保存在幕后的，例如html或xml。

当我手动下载excel文件并使用Excel打开它时，会收到一条错误消息：文件格式和扩展名不匹配。该文件可能已损坏或不安全。除非你相信它的来源，否则不要打开它＆＃34;

当我打开它时，它看起来就像一个普通的excel文件。

我在网上看到一篇帖子，建议我在文本编辑器中打开文件，看看是否有关于正确文件格式的其他信息，但是当使用notepad ++打开时我没有看到任何其他信息。< / p>

有人可以帮我解决这个问题＆＃34; xls＆＃34;文件正确读入大熊猫DataFramj吗？

Answer 1

您似乎可以使用read_csv：

import pandas as pd

df = pd.read_csv('https://cib.societegenerale.com/fileadmin/indices_feeds/CTA_Historical.xls',
                 sep='\t',
                 parse_dates=[0],
                 names=['a','b','c','d','e','f'])
print df

然后，如果其他值为f，我会检查上一栏NaN：

print df[df.f.notnull()]

Empty DataFrame
Columns: [a, b, c, d, e, f]
Index: []

因此只有NaN，因此您可以按参数f过滤最后一列usecols：

import pandas as pd

df = pd.read_csv('https://cib.societegenerale.com/fileadmin/indices_feeds/CTA_Historical.xls',
                 sep='\t',
                 parse_dates=[0],
                 names=['a','b','c','d','e','f'],
                 usecols=['a','b','c','d','e'])
print df

Answer 2

如果这可以帮助某人..您可以直接通过URL向Excel中读取Google云端硬盘文件，而无需任何登录要求。我在Google Colab中试过了。

将XL文件上传到Google云端硬盘，或使用已上传的文件
使用链接将文件共享给任何人（我不知道视图是否仅适用，但我尝试了完全访问权限）
复制链接

您将得到类似的东西。

共享网址：https://drive.google.com/file/d/---some--long--string/view?usp=sharing

从尝试下载文件获取下载网址（从此处复制网址）

这将是这样的：（它具有与上面相同的google文件ID）

下载网址：https://drive.google.com/u/0/uc?id=---some--long--string&export=download

现在转到Google Colab并粘贴以下代码：

import pandas as pd

fileurl   = r'https://drive.google.com/file/d/---some--long--string/view?usp=sharing'
filedlurl = r'https://drive.google.com/u/0/uc?id=---some--long--string&export=download'

df = pd.read_excel(filedlurl)
df

就是这样。文件在您的df中。

使用Pandas从URL读取excel文件 - XLRDError

2 个答案: