Question

在这个项目中，我将csv文件转换为xls文件，将txt文件转换为xls文件。目标是比较两个xls文件的差异，并将任何差异打印到第三个excel文件。

但是，当打印差异时，它们包含任何大于999的整数的条目，因为我转换的csv文件中的任何整数都被视为字符串而不是整数。因此，由于转换后的csv excel文件中的逗号，它会将1200（在我转换的xls文件中）中的值与1200（在我转换的txt文件中）区别对待。

我的问题是：有没有办法将字符串解释的整数转换回被解释为整数？否则，有没有办法删除我的xls文件中的所有逗号？我尝试过通常的dataframe.replace方法，但效果不佳。

以下是我的代码：

#import required libraries
import datetime
import xlrd
import pandas as pd

#define the time_handle function to name the outputted excel files
time_handle = datetime.datetime.now().strftime("%Y%m%d_%H%M")

#identify XM1 file paths (for both csv origin and excel destination)
XM1_csv = r"filepath"
XM2_excel = r"filepath" + time_handle + ".xlsx"

#identify XM2 file paths (for both txt origin and excel destination)
XM2_txt = r"filepath"
XM2_excel = r"filepath" + time_handle + ".xlsx"

#remove commas from XM1 excel - failed attempts
#XM1_excel = [col.replace(',', '') for col in XM1_excel]
#XM1_excel = XM1_excel.replace(",", "")
#for line in XM1_excel:
        #XM1_excel.write(line.replace(",", ""))

#remove commas from XM1 CSV - failed attempts
#XM1_csv = [col.replace(',', '') for col in XM1_csv]
#XM1_csv = XM1_csv.replace(",", "")
#for line in XM1_csv:
        #XM1_excel.write(line.replace(",", ""))

#convert the csv XM1 file to an excel file, in the same folder
pd.read_csv(XM1_csv).to_excel(XM1_excel)

#convert the txt XM2 file to an excel file in the same folder
pd.read_csv(XM2_txt, sep="|").to_excel(XM2_excel)



#confirm XM1 filepath
filepath_XM1 = XM1_excel

#confirm XM2 filepath
filepath_XM2 = XM2_excel
#read relevant columns from the excel files
df1 = pd.read_excel(filepath_XM2, sheetname="Sheet1", parse_cols= "H, J, M, U")
df2 = pd.read_excel(filepath_XM1, sheetname="Sheet1", parse_cols= "C, E, G, K")

#remove all commas from XM1 - failed attempts
#df2 = [col.replace(',', '') for col in df2]
#df2 = df2.replace(",", "")
#for line in df2:
        #df2.write(line.replace(",", ""))

#merge the columns from both excel files into one column each respectively
df4 = df1["Exchange Code"] + df1["Product Type"] + df1["Product Description"] + df1["Quantity"].apply(str)
df5 = df2["Exchange"] + df2["Product Type"] + df2["Product Description"] + df2["Quantity"].apply(str)

#concatenate both columns from each excel file, to make one big column containing all the data
df = pd.concat([df4, df5])

#remove all whitespace from each row of the column of data
df=df.str.strip()
df=["".join(x.split()) for x in df]

#convert the data to a dataframe from a series
df = pd.DataFrame({'Value': df})

#remove any duplicates
df.drop_duplicates(subset=None, keep=False, inplace=True)

#print to the console just as a visual aid
print(df)
#output_path = r"filepath"
#print the erroneous entries to an excel file
df.to_excel("XM1_XM2Comparison" + time_handle + ".xls")

另外，我意识到关于df1和df2的XM1和XM2文件名有点令人困惑，但我只是重命名了我的文件。它在文件方面以及它们在代码中的位置是有意义的！

谢谢

Answer 1

您可以在数据框的读取端尝试一个名为converters的参数，您可以在其中指定数据类型。例如：

df= pd.read_excel(file, sheetname=YOUR_SHEET_HERE, converters={'FIELD_NAME': str})

converters同时位于read_csv和read_excel

Answer 2

我实际上通过一个简单的修复解决了这个问题，以备将来参考。当使用pd.read_csv读取csv时，我添加了千位方法，所以看起来像这样：

pd.read_csv(XM1, thousands = ",").to_excel(XM1_excel)

将CSV文件转换为Excel后，整数将存储为字符串 - 如何将它们转换回来？

2 个答案: