Question

我正在尝试使用Jupyter Notebook将SQL查询提取到Pandas数据框。

import pandas as pd

df = pd.read_sql(sql, cnxn)

cnxn = pyodbc.connect(connection_info) 
cursor = cnxn.cursor()
sql = """SELECT * FROM AdventureWorks2012.Person.Address 
WHERE City = 'Bothell' 
ORDER BY AddressID ASC"""
df = psql.frame_query(sql, cnxn)
cnxn.close()

但是，每当我运行代码时，它就会显示：

NameError                                 
Traceback (most recent call last)
<ipython-input-5-4ea4efb152fe> in <module>()
  1 import pandas as pd
  2 
  3 df = pd.read_sql(sql, cnxn)
  4 
  5 cnxn = pyodbc.connect(connection_info)

NameError: name 'sql' is not defined

我正在使用受监控的网络（如果有人要求，则使用公司网络）。

我想问一些问题：

我是否需要将connection_info更改为数据库中的信息？
连接到可能对端口连接有限制的网络是否重要？公司设立了其中一些。

我正在使用最新的Anaconda发行版。

Answer 1

您收到的错误是由代码的订单引起的：

1  import pandas as pd
2  df = pd.read_sql(sql, cnxn)  ## You call the variable sql here, but don't assign it until line 6
3 
4  cnxn = pyodbc.connect(connection_info) 
5  cursor = cnxn.cursor()
6  sql = """SELECT * FROM AdventureWorks2012.Person.Address 
7  WHERE City = 'Bothell' 
8  ORDER BY AddressID ASC"""
9  df = psql.frame_query(sql, cnxn)
10 cnxn.close()

您正在第2行调用变量sql，但实际上直到第6行才定义变量。
您还缺少一些库，并且根据Beardc的代码，您似乎已经将他的两个答案的一些错误部分组合在一起了。

尝试按以下方式排列代码：

（请注意，此代码未经测试，其他问题如下所述）

#Import the libraries
import pandas as pd
import pyodbc
#Give the connection info
cnxn = pyodbc.connect(connection_info) 
#Assign the SQL query to a variable
sql = "SELECT * FROM AdventureWorks2012.Person.Address WHERE City = 'Bothell' ORDER BY AddressID ASC"
#Read the SQL to a Pandas dataframe
df = pd.read_sql(sql, cnxn)

回答您的问题：

是的，您需要将connection_info更改为数据库中的信息。有一个很好的示例，您需要在其中here
此特定问题不是由您的网络限制引起的。

使用Jupyter Notebook将SQL转换为Panda Data Frame

1 个答案: