这是我的工作代码,它是从网站下载excel文件。大约需要40秒。
运行此代码后,您会注意到Key1,Key2和Key3列是对象dtypes。我清理了数据框,使key1和key2只有字母数字值。仍然是熊猫将它作为对象dtype。我需要连接(如在MS Excel中)Key1和Key2来创建一个名为deviceid的单独列。我意识到我不能加入这两列,因为它们是对象dtypes。我如何转换为字符串以便我可以创建我的新列?
import pandas as pd
import urllib.request
import time
start=time.time()
url="https://www.misoenergy.org/Library/Repository/Market%20Reports/20170816_da_bcsf.xls"
cnstsfxls = urllib.request.urlopen(url)
xlsf = pd.ExcelFile(cnstsfxls)
dfsf = xlsf.parse("Sheet1",skiprows=3)
dfsf.drop(dfsf.index[len(dfsf)-1],inplace=True)
dfsf.drop(dfsf[dfsf['Device Type'] == 'UN'].index, inplace=True)
dfsf.drop(dfsf[dfsf['Device Type'] == 'UNKNOWN'].index, inplace=True)
dfsf.drop(['Constraint Name','Contingency Name', 'Constraint Type','Flowgate Name'],axis=1, inplace=True)
end=time.time()
print("The entire process took - ", end-start, " seconds.")
答案 0 :(得分:0)
我可能在这里忽略了这一点。但是,如果你要做的是构建一个列,例如deviceid = RCH417
Key1 = RCH
和Key2 = 417
,那么dfsf['deviceid'] = dfsf['Key1'] + dfsf['Key2']
即使两个列都是类型对象。
试试这个:
# Check value types
dfsf.dtypes
# Add your desired column
dfsf['deviceid'] = dfsf['Key1'] + dfsf['Key2']
# Inspect columns of interest
keep = ['Key1', 'Key2', 'deviceid']
df_keys = dfsf[keep]
print(df_keys.dtypes)
print(df_keys.head())