在Dataframe

时间:2015-06-20 00:55:14

标签: python pandas

我正在尝试使用字符串中的数字替换下面Dataframe的Years列中的字符串。例如,我想将ZC025YR更改为025。我的代码如下:

import urllib, urllib2
import csv
from StringIO import StringIO
import pandas as pd
import os
from zipfile import ZipFile
from pprint import pprint, pformat

my_url = 'http://www.bankofcanada.ca/stats/results/csv'
data = urllib.urlencode({"lookupPage": "lookup_yield_curve.php",
                         "startRange": "1986-01-01",
                         "searchRange": "all"})
request = urllib2.Request(my_url, data)
result = urllib2.urlopen(request)
zipdata = result.read()
zipfile = ZipFile(StringIO(zipdata))

df = pd.read_csv(zipfile.open(zipfile.namelist()[0]))

df = pd.melt(df, id_vars=['Date'])

df.rename(columns={'variable': 'Years'}, inplace=True)

我目前的数据框如下所示:

              Date     Years          value
0       1986-01-01   ZC025YR             na
1       1986-01-02   ZC025YR   0.0948511020
2       1986-01-03   ZC025YR   0.0972953210
3       1986-01-06   ZC025YR   0.0965403640
.....

但是,如果我为了重构我的数据帧而添加以下代码,我会收到行ValueError: cannot convert float NaN to integer中的错误df['Years'] = df['Years'].str.extract('(\d+)').astype(int),这很奇怪,因为当我查看Year& CSV文件中的数据我没有看到任何' NaN'与之相关。

#Converting the strings in this column into just the number of Years
df['Years'] = df['Years'].str.extract('(\d+)').astype(int)
df['Years'] = df.Years/100

谢谢

1 个答案:

答案 0 :(得分:1)

尝试创建一个新功能,将字符串转换为integer并在Series.apply方法中调用,如下所示 -

编辑:将逻辑添加到默认空字符串0,如果要以years colomn处理空字符串,请使用不同的值

import re
def getYear(s):
    x = re.search('(\d+)',s)
    return int(x.groups()[0]) if x is not None else 0 # or however you want to handle it

然后将此功能用作 -

df['Years'] = df['Years'].apply(getYear)