我正在尝试使用字符串中的数字替换下面Dataframe的Years
列中的字符串。例如,我想将ZC025YR
更改为025
。我的代码如下:
import urllib, urllib2
import csv
from StringIO import StringIO
import pandas as pd
import os
from zipfile import ZipFile
from pprint import pprint, pformat
my_url = 'http://www.bankofcanada.ca/stats/results/csv'
data = urllib.urlencode({"lookupPage": "lookup_yield_curve.php",
"startRange": "1986-01-01",
"searchRange": "all"})
request = urllib2.Request(my_url, data)
result = urllib2.urlopen(request)
zipdata = result.read()
zipfile = ZipFile(StringIO(zipdata))
df = pd.read_csv(zipfile.open(zipfile.namelist()[0]))
df = pd.melt(df, id_vars=['Date'])
df.rename(columns={'variable': 'Years'}, inplace=True)
我目前的数据框如下所示:
Date Years value
0 1986-01-01 ZC025YR na
1 1986-01-02 ZC025YR 0.0948511020
2 1986-01-03 ZC025YR 0.0972953210
3 1986-01-06 ZC025YR 0.0965403640
.....
但是,如果我为了重构我的数据帧而添加以下代码,我会收到行ValueError: cannot convert float NaN to integer
中的错误df['Years'] = df['Years'].str.extract('(\d+)').astype(int)
,这很奇怪,因为当我查看Year
& CSV文件中的数据我没有看到任何' NaN'与之相关。
#Converting the strings in this column into just the number of Years
df['Years'] = df['Years'].str.extract('(\d+)').astype(int)
df['Years'] = df.Years/100
谢谢
答案 0 :(得分:1)
尝试创建一个新功能,将字符串转换为integer
并在Series.apply
方法中调用,如下所示 -
编辑:将逻辑添加到默认空字符串0
,如果要以years
colomn处理空字符串,请使用不同的值
import re
def getYear(s):
x = re.search('(\d+)',s)
return int(x.groups()[0]) if x is not None else 0 # or however you want to handle it
然后将此功能用作 -
df['Years'] = df['Years'].apply(getYear)