我基本上按如下方式运行一些代码。基本上我只是从CSV文件中检索成对的股票(排列为第1行 - 股票1,2,第2行 - 股票1,2等,其中股票1和2在每行中不同)。然后我从雅虎那里获取与这些" Pairs"相关的数据。股票。我计算股票的回报,并基本检查一对股票之间的距离(回报差异)是否超过某个阈值,如果是,我返回1.但是,我得到以下错误,我无法解决:
PricePort(tickers)
27 for ticker in tickers:
28 #print ticker
---> 29 x = pd.read_csv('http://chart.yahoo.com/table.csv?s=ttt'.replace('ttt',ticker),usecols=[0,6],index_col=0)
30 x.columns=[ticker]
31 final=pd.merge(final,x,left_index=True,right_index=True)
TypeError: expected a character buffer object
代码如下:
from datetime import datetime
import pytz
import csv
import pandas as pd
import pandas.io.data as web
import numpy as np
#Retrieves pairs of stocks (laid out as Row 1-Stock 1,2, Row 2-Stock 1,2 and so on, where Stock 1 and 2 are different in each row) from CSV File
def Dataretriever():
Pairs = []
f1=open('C:\Users\Pythoncode\Pairs.csv') #Enter the location of the file
csvdata= csv.reader(f1)
for row in csvdata: #reading tickers from the csv file
Pairs.append(row)
return Pairs
tickers = Dataretriever() #Obtaining the data
#Taking in data from Yahoo associated with these "Pairs" of Stocks
def PricePort(tickers):
"""
Returns historical adjusted prices of a portfolio of stocks.
tickers=pairs
"""
final=pd.read_csv('http://chart.yahoo.com/table.csv?s=^GSPC',usecols=[0,6],index_col=0)
final.columns=['^GSPC']
for ticker in tickers:
#print ticker
x = pd.read_csv('http://chart.yahoo.com/table.csv?s=ttt'.replace('ttt',ticker),usecols=[0,6],index_col=0)
x.columns=[ticker]
final=pd.merge(final,x,left_index=True,right_index=True)
return final
#Calculating returns of the stocks
def Returns(tickers):
l = []
begdate=(2014,1,1)
enddate=(2014,6,1)
p = PricePort(tickers)
ret = (p.close[1:] - p.close[:-1])/p.close[1:]
l.append(ret)
return l
#Basically a class to see if the distance (difference in returns) between a
#pair of stocks breaches some threshold
class ThresholdClass():
#constructor
def __init__(self, Pairs):
self.Pairs = Pairs
#Calculating the distance (difference in returns) between a pair of stocks
def Distancefunc(self, tickers):
k = 0
l = Returns(tickers)
summation=[[0 for x in range (k)]for x in range (k)] #2d matrix for the squared distance
for i in range (k):
for j in range (i+1,k): # it will be a upper triangular matrix
for p in range (len(self.PricePort(tickers))-1):
summation[i][j]= summation[i][j] + (l[i][p] - l[j][p])**2 #calculating distance
for i in range (k): #setting the lower half of the matrix 1 (if we see 1 in the answer we will set a higher limit but typically the distance squared is less than 1)
for j in range (i+1):
sum[i][j]=1
return sum
#This function is used in determining the threshold distance
def MeanofPairs(self, tickers):
sum = self.Distancefunc(tickers)
mean = np.mean(sum)
return mean
#This function is used in determining the threshold distance
def StandardDeviation(self, tickers):
sum = self.Distancefunc(tickers)
standard_dev = np.std(sum)
return standard_dev
def ThresholdandnewsChecker(self, tickers):
threshold = self.MeanofPairs(tickers) + 2*self.StandardDeviation(tickers)
if (self.Distancefunc(tickers) > threshold):
return 1
Threshold_Class = ThresholdClass(tickers)
Threshold_Class.ThresholdandnewsChecker(tickers,1)
答案 0 :(得分:1)
问题是Dataretriever()
返回一个列表,而不是一个字符串。当您遍历tickers()
时,名称ticker
将绑定到列表。
str.replace
方法要求两个参数都是字符串。以下代码引发错误,因为第二个参数是一个列表:
'http://chart.yahoo.com/table.csv?s=ttt'.replace('ttt', ticker)
后续行x.columns = [ticker]
将导致类似问题。这里,ticker
需要是一个可散列的对象(如字符串或整数),但列表不可散列。