Question

我一直在调试我的脚本，我已经将我的问题缩小到几行代码，我相信这会导致我的问题。我正在读取3个csv文件中的数据，从SQL Server中的sproc中提取数据，并将数据从两者导出到excel文件以绘制cmparisons。我得到的问题是我的源文件正在生成重复项（每个源文件一行）。我将print语句放在以下数据中，以了解最新情况。

#convert district codes to strings
if dfyearfound:
    df2['district_code']=df2['district_code'].apply(lambda x: str(x))
    print df2['district_code'][df2.index[0]]
    df2['district_type_code']=df2['district_type_code'].apply(lambda x: str(x))
    print df2['district_type_code'][df2.index[0]]
if teacheryearfound:
    teacherframe['district_code']=teacherframe['district_code'].apply(lambda x: str(x))
    print teacherframe['district_code'][teacherframe.index[0]]
    teacherframe['district_type_code']=teacherframe['district_type_code'].apply(lambda x: str(x))
    print teacherframe['district_type_code'][teacherframe.index[0]]
if financialyearfound:
    financialframe['district_code']=financialframe['district_code'].apply(lambda x: str(x))
    print financialframe['district_code'][financialframe.index[0]]
    financialframe['district_type_code']=financialframe['district_type_code'].apply(lambda x: str(x))
    print financialframe['district_type_code'][financialframe.index[0]]

print语句给出了以下输出：1,1,1,3.0,1212,1

所有dist_codes的长度应为4，它们在源文件中的位置从1位到4位不等。在数据库中，它们都是4位数（例如：0001,001）。分区类型是1或2位数，在数据库中都是2（例如：01,03）。我不确定为什么上面的字符串转换不起作用。我打算编写一个函数来格式化zone_code和district_type_code但是我不想硬编码长度和我写的函数我无法工作：

#function for formating district codes
def formatDistrictCodes(code):

    dist=code
    dist.zfill(4)

    return dist


formatDistrictCodes(districtformat)

Answer 1

我认为问题的关键在于：

所有dist_codes的长度应为4，并且它们的来源不同文件从1位到4位。在数据库中，它们都是4位数（例如：0001,00 0012）。区类型为1或2位数，均为2 在数据库中（例如：01,03）。

在Python中，任何以0开头的数字都是八进制：

>>> 016
14

所以你真正想要的是取一个数字，并将一系列零前置为固定长度为4，然后确保它是一个字符串。

>>> str(1).zfill(4)
'0001'

在您的代码中，这将是：

str(df2['district_code']).zfill(4)

请注意，这不会强制执行长度。它只会确保最小长度为4.对于超过4位的所有值，上述操作无效。

转换问题 - Python？

1 个答案: