有人帮我解决以下问题吗?我自己尝试过,我也附上了解决方案。我使用了2-d列表,但是我想要一个没有2-d列表的不同解决方案,它应该更加pythonic。
pl建议我,你们中的任何一个人都有其他办法。
Q)考虑CSV文件中自1990年以来每月给出的N个公司的股价。文件格式如下,第一行为标题。
年,月,公司A,公司B,公司C,.............公司N
1990,Jan,10,15,20,..........,50
1990,Feb,10,15,20,..........,50
2013年9月,50日,10日,15日............ 500
解决方案应采用此格式。 a)股票价格最高的每个公司年月的清单。
以下是我使用2-d列表的答案。
def generate_list(file_path):
'''
return list of list's containing file data.'''
data_list=None #local variable
try:
file_obj = open(file_path,'r')
try:
gen = (line.split(',') for line in file_obj) #generator, to generate one line each time until EOF (End of File)
for j,line in enumerate(gen):
if not data_list:
#if dl is None then create list containing n empty lists, where n will be number of columns.
data_list = [[] for i in range(len(line))]
if line[-1].find('\n'):
line[-1] = line[-1][:-1] #to remove last list element's '\n' character
#loop to convert numbers from string to float, and leave others as strings only
for i,l in enumerate(line):
if i >=2 and j >= 1:
data_list[i].append(float(l))
else:
data_list[i].append(l)
except IOError, io_except:
print io_except
finally:
file_obj.close()
except IOError, io_exception:
print io_exception
return data_list
def generate_result(file_path):
'''
return list of tuples containing (max price, year, month,
company name).
'''
data_list = generate_list(file_path)
re=[] #list to store results in tuple formet as follow [(max_price, year, month, company_name), ....]
if data_list:
for i,d in enumerate(data_list):
if i >= 2:
m = max(data_list[i][1:]) #max_price for the company
idx = data_list[i].index(m) #getting index of max_price in the list
yr = data_list[0][idx] #getting year by using index of max_price in list
mon = data_list[1][idx] #getting month by using index of max_price in list
com = data_list[i][0] #getting company_name
re.append((m,yr,mon,com))
return re
if __name__ == '__main__':
file_path = 'C:/Document and Settings/RajeshT/Desktop/nothing/imp/New Folder/tst.csv'
re = generate_result(file_path)
print 'result ', re
I have tried to solve it with generator also, but in that case it was giving result for only one company i.e. only one column.
p = 'filepath.csv'
f = open(p,'r')
head = f.readline()
gen = ((float(line.split(',')[n]), line.split(',',2)[0:2], head.split(',')[n]) for n in range(2,len(head.split(','))) for i,line in enumerate(f))
x = max((i for i in gen),key=lambda x:x[0])
print x
您可以使用以下提供的csv格式的输入数据。
year,month,company 1,company 2,company 3,company 4,company 5
1990,jan,201,245,243,179,133
1990,feb,228,123,124,121,180
1990,march,63,13,158,88,79
1990,april,234,68,187,67,135
1990,may,109,128,46,185,236
1990,june,53,36,202,73,210
1990,july,194,38,48,207,72
1990,august,147,116,149,93,114
1990,september,51,215,15,38,46
1990,october,16,200,115,205,118
1990,november,241,86,58,183,100
1990,december,175,97,143,77,84
1991,jan,190,68,236,202,19
1991,feb,39,209,133,221,161
1991,march,246,81,38,100,122
1991,april,37,137,106,138,26
1991,may,147,48,182,235,47
1991,june,57,20,156,38,245
1991,july,165,153,145,70,157
1991,august,154,16,162,32,21
1991,september,64,160,55,220,138
1991,october,162,72,162,222,179
1991,november,215,207,37,176,30
1991,december,106,153,31,247,69
预期产出如下。
[(246.0, '1991', 'march', 'company 1'),
(245.0, '1990', 'jan', 'company 2'),
(243.0, '1990', 'jan', 'company 3'),
(247.0, '1991', 'december', 'company 4'),
(245.0, '1991', 'june', 'company 5')]
提前致谢...
答案 0 :(得分:3)
使用collections.OrderedDict
和collections.namedtuple
:
import csv
from collections import OrderedDict, namedtuple
with open('abc1') as f:
reader = csv.reader(f)
tup = namedtuple('tup', ['price', 'year', 'month'])
d = OrderedDict()
names = next(reader)[2:]
for name in names:
#initialize the dict
d[name] = tup(0, 'year', 'month')
for row in reader:
year, month = row[:2] # Use year, month, *prices = row in py3.x
for name, price in zip(names, map(int, row[2:])): # map(int, prices) py3.x
if d[name].price < price:
d[name] = tup(price, year, month)
print d
<强>输出:强>
OrderedDict([
('company 1', tup(price=246, year='1991', month='march')),
('company 2', tup(price=245, year='1990', month='jan')),
('company 3', tup(price=243, year='1990', month='jan')),
('company 4', tup(price=247, year='1991', month='december')),
('company 5', tup(price=245, year='1991', month='june'))])
答案 1 :(得分:1)
我不完全确定你想输出的是什么,所以现在我只需要将输出打印到屏幕上。
import os
import csv
import codecs
## Import data !!!!!!!!!!!! CHANGE TO APPROPRIATE PATH !!!!!!!!!!!!!!!!!
filename= os.path.expanduser("~/Documents/PYTHON/StackTest/tailor_raj/Workbook1.csv")
## Get useable data
data = [row for row in csv.reader(codecs.open(filename, 'rb', encoding="utf_8"))]
## Find Number of rows
row_count= (sum(1 for row in data)) -1
## Find Number of columns
## Since this cannot be explicitly done, I set it to run through the columns on one row until it fails.
## Failure is caught by try/except so the program does not crash
columns_found = False
column_try =1
while columns_found == False:
column_try +=1
try:
identify_column = data[0][column_try]
except:
columns_found=True
## Set column count to discoverd column count (1 before it failed)
column_count=column_try-1
## Set which company we are checking (start with the first company listed. Since it starts at 0 the first company is at 2 not 3)
companyIndex = 2
#This will keep all the company bests as single rows of text. I was not sure how you wanted to output them.
companyBest=[]
## Set loop to go through each company
while companyIndex <= (column_count):
## For each new company reset the rowIndex and highestShare
rowIndex=1
highestShare=rowIndex
## Set loop to go through each row
while rowIndex <=row_count:
## Test if data point is above or equal to current max
## Currently set to use the most recent high point
if int(data[highestShare][companyIndex]) <= int(data[rowIndex][companyIndex]):
highestShare=rowIndex
## Move on to next row
rowIndex+=1
## Company best = Company Name + year + month + value
companyBest.append(str(data[0][companyIndex])+": "+str(data[highestShare][0]) +", "+str(data[highestShare][1])+", "+str(data[highestShare][companyIndex]))
## Move on to next company
companyIndex +=1
for item in companyBest:
print item
请务必更改您的文件名路径。
输出目前显示如下:
A公司:1990年11月,1985年
B公司:1990年5月,52873
C公司:1990年5月,3658
D公司:1990年11月156498
E公司:1990年7月,987
答案 2 :(得分:1)
遗憾的是没有生成器,但代码大小很小,特别是在Python 3中:
from operator import itemgetter
from csv import reader
with open('test.csv') as f:
year, month, *data = zip(*reader(f))
for pricelist in data:
name = pricelist[0]
prices = map(int, pricelist[1:])
i, price = max(enumerate(prices), key=itemgetter(1))
print(name, price, year[i+1], month[i+1])
在Python 2.X中,您可以使用以下(以及不同的print语句)执行相同的操作,但稍微更笨拙:
with open('test.csv') as f:
columns = zip(*reader(f))
year, month = columns[:2]
data = columns[2:]
好吧,我想出了一些令人毛骨悚然的发电机!它还利用词典元组比较和reduce
来比较连续的行:
from functools import reduce # only in Python 3
import csv
def group(year, month, *prices):
return ((int(p), year, month) for p in prices)
def compare(a, b):
return map(max, zip(a, group(*b)))
def run(fname):
with open(fname) as f:
r = csv.reader(f)
names = next(r)[2:]
return zip(names, reduce(compare, r, group(*next(r))))
list(run('test.csv'))