我将python中的SQL查询转换为pandas数据帧。然后我使用pandas sql在两个pandas数据帧之间进行左外连接。 我的代码是:
import MySQLdb as mdb
from pandasql import sqldf
from collections import OrderedDict
from pandas import DataFrame
import pandas as pd
top_name_gender = [['Nicole','female'],['Jerson','male'],['Kim','female']]
gender = OrderedDict()
gender['first_name'] = []
gender['gender'] = []
for row in top_name_gender:
gender['first_name'].append(row[0])
gender['gender'].append(row[1])
gender_df = DataFrame(gender)
customer = OrderedDict()
customer['email'] = []
customer['first_name'] = []
customer['gender'] = []
query_customer = """SELECT
email,
lower(substring_index(first_name,' ',1)) as first_name,
gender
FROM bob_live.customer
limit 10000000000"""
con = mdb.connect(host='db03.phlan', port=3306, user='crm', passwd='.....', db='bob_live')
cur = con.cursor()
cur.execute(query_customer)
for row in cur.fetchall():
customer['email'].append(row[0])
customer['first_name'].append(row[1])
customer['gender'].append(row[2])
customer_df = DataFrame(customer)
query1 = """"
select customer_df.*, gender_df.*
from customer_df
left outter join gender_df
on customer_df.first_name = gender_df.first_name"""
joined = sqldf(query1,locals())
joined.text_factory = str
但我有以下错误:
sqlite3.ProgrammingError:除非使用8位字节串,否则不得使用 你使用一个可以解释8位字节串的text_factory(比如 text_factory = str)。强烈建议您改为 将您的应用程序切换到Unicode字符串
我尝试添加
con.text_factory = str
cur.text_factory = str
但它并没有改变任何事情。
有什么建议吗?
答案 0 :(得分:3)
原来我的问题在于我试图合并的数据框中的文本。在调用read_csv时指定正确的编码就足够了:例如encoding='latin-1'