我该怎么做才能防止pandas将我的字符串值转换为float。列library(ggplot2)
ggplot(data_tidy, aes(x = reorder(player, -value), y = value, col = this_player, fill = this_player)) +
geom_bar(stat = "identity") +
facet_grid(statcategory ~ statistic) +
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank(),
axis.title.y=element_blank()) +
ggtitle("Feet / Minute Players are Running") +
theme(plot.title = element_text(lineheight=.8, face="bold", hjust = 0.5))
和Billing Doc.
包含10-11位数字,这些数字将存储在MySQL表中,其数据类型为CHAR(15)。当我执行以下脚本时,我会在每个数字的末尾看到Sales Order
。我想在我们的数据库中将它们视为字符串/字符。
.0
字段包含Billing Doc.
等数字,其存储在数据库中为3206790137, 3209056079, 3209763880, 3209763885, 3206790137
。数据库中“结算”文档的列数据类型为3206790137.0, 3209056079.0, 3209763880.0, 3209763885.0, 3206790137.0
。
CHAR(15)
当我创建一个简单的df并打印它时,问题就不会出现了。
def insert_billing(df):
df = df.where((pd.notnull(df)), None)
for row in df.to_dict(orient="records"):
bill_item = row['Bill.Item']
bill_qty = row['Billed Qty']
bill_doct_date = row['Billi.Doc.Date']
bill_doc = row['Billing Doc.']
bill_net_value = row['Billi.Net Value']
sales_order = row['Sales Order']
import_date = DT.datetime.now().strftime('%Y-%m-%d')
query = "INSERT INTO sap_billing(" \
"bill_item, " \
"bill_qty, " \
"bill_doc_date, " \
"bill_doc, " \
"bill_net_value, " \
"sales_order, " \
"import_date" \
") VALUES (" \
"\"{}\", \"{}\", \"{}\", \"{}\"," \
"\"{}\", \"{}\", \"{}\"" \
") ON DUPLICATE KEY UPDATE " \
"bill_qty = VALUES(bill_qty), " \
"bill_doc_date = VALUES(bill_doc_date), " \
"bill_net_value = VALUES(bill_net_value), " \
"import_date = VALUES(import_date) " \
"".format(
bill_item,
bill_qty,
bill_doct_date,
bill_doc,
bill_net_value,
sales_order,
import_date
)
query = query.replace('\"None\"', 'NULL')
query = query.replace('(None', '(NULL')
query = query.replace('\"NaT\"', 'NULL')
query = query.replace('(NaT', '(NULL')
try:
q1 = gesdb_connection.execute(query)
except Exception as e:
print(bill_item, bill_doc, sales_order, e)
if __name__ == "__main__":
engine_str = 'mysql+mysqlconnector://root:abc123@localhost/mydb'
file_name = "tmp/dataload/so_tracking.XLSX"
df = pd.read_excel(file_name)
if df.shape[1] == 35 and compare_columns(list(df.columns.values)) == 1:
insert_billing(df)
else:
print("Incorrect column count, column order or column headers.\n")
然而,当我通读excel然后打印它时,该列被读为float64。
import pandas as pd
df = pd.DataFrame({'Sales Order': [1217252835, 1217988754, 1219068439],
'Billing Doc.': [3222102723, 3209781889, 3214305818]})
>>> df
Billing Doc. Sales Order
0 3222102723 1217252835
1 3209781889 1217988754
2 3214305818 1219068439
答案 0 :(得分:0)
我自己找到了解决方案,在此发布以记录它。
df = pd.read_excel(file_name, converters={'Billing Doc.' : str})
print(df['Billing Doc.'])
695 3251631331
696 3252012614
697 NaN
698 3252272451
699 3252359504
700 3252473894
701 NaN
702 NaN
703 NaN
704 3252652940
705 NaN
706 NaN
707 NaN
708 NaN
Name: Billing Doc., dtype: object
答案 1 :(得分:0)
类似的事情发生在我身上,因为新列的索引与原始数据帧的索引不匹配,从而导致NaN,导致自动广播浮动。因此,请检查:
答案 2 :(得分:-1)
试试这个:
df = df.astype(str)
请注意,这非常无效
或在将每个值插入查询
之前将其转换为int