将pd.read_sql_query转换为pd.DataFrame会将字符串转换为nan

时间:2019-06-26 14:23:15

标签: python sql pyodbc

当我尝试使用dockerUpdateLatestpd.read_sql_query生成的SQL查询到数据帧时,我的字符串值将转换为pd.DataFrame

我尝试使用dtypes设置每一列的类型

nan

SQL查询输出:

SQL_Query = pd.read_sql_query('''SELECT [CircuitID], [Status], 
                                        [LatestJiraTicket], [MrcNew] 
                                  FROM CircuitInfoTable 
                                  WHERE ([Status] = 'Active') 
                                     OR ([Status] = 'Pending')
                                     OR ([Status] = 'Planned')''', conn)
# print(SQL_Query)
cdf = pd.DataFrame(SQL_Query, columns=['CID', 'Status', 'JiraTicket', 'MrcNew'])

DataFrame输出:

0                                      OH1004-01  ...      NaN
1                                      OH1004-02  ...      NaN
2                                      OH1005-01  ...      NaN
3                                      OH1005-02  ...      NaN
4                                      AL1001-01  ...      NaN
5                                      AL1001-02  ...      NaN
6                                      AL1007-01  ...      NaN
7                                      AL1007-02  ...      NaN
8                                      NC1001-01  ...      NaN
9                                      NC1001-02  ...      NaN
10                                     NC1001-03  ...      NaN
11                                     NC1001-04  ...      NaN
12                                     NC1001-05  ...      NaN
13                                     NC1001-06  ...      NaN
14                          (ommited on purpose)  ...   5200.0
15                                      MO001-02  ...      NaN
16                                      OR020-01  ...   8000.0
17                                      MA004-01  ...   6500.0
18                                      MA004-02  ...   6500.0
19                                      OR004-01  ...  10500.0
20                          (ommited on purpose)  ...   3975.0
21                                      OR007-01  ...   2500.0
22                          (ommited on purpose)  ...   9200.0
23                          (ommited on purpose)  ...  15000.0
24                          (ommited on purpose)  ...   5750.0
25                                     CA1005-02  ...  47400.0
26                                     CA1005-03  ...  47400.0
27                                     CA1005-04  ...  47400.0
28                                     CA1005-05  ...  47400.0
29                                     CA1006-01  ...      0.0

1 个答案:

答案 0 :(得分:1)

基本上,您在pandas.DataFrame中错误地使用了 columns 自变量,其中该行距指定要在结果输出中选择的列(而不是重命名)。根据您的查询,没有 CID JiraTicket ,因此它们会迁移所有缺少的值。

可能您打算重命名列。考虑使用带列别名的SQL重命名或使用renameset_axis的熊猫重命名:

SELECT [CircuitID] AS [CID], 
       [Status], 
       [LatestJiraTicket] AS JiraTicket, 
       [MrcNew] 
FROM CircuitInfoTable 
WHERE ([Status] = 'Active') 
   OR ([Status] = 'Pending')
   OR ([Status] = 'Planned')

熊猫

cdf = (pd.read_sql_query(...original query...)
         .rename(columns={'CircuitID': 'CID', 'LatestJiraTicket': 'JiraTicket'})
      )

cdf = (pd.read_sql_query(...original query...)
         .set_axis(['CID', 'Status', 'JiraTicket', 'MrcNew'], axis='columns', inplace=False)
      )