我有一个带有2列的制表符分隔文本文件,Bill to Name和Date,date是Excel数字格式。代码......
import csv
from collections import defaultdict
d = defaultdict( list )
input_file = "C:\\Users\\Intern\\Documents\\Python.txt"
output_file = "C:\\Users\\Intern\\Documents\\b.csv"
with open( input_file, 'r') as infile:
reader = csv.reader(infile, delimiter='\t')
next(reader, None) # skip the header
for row in reader:
d[ row[0] ].append( int(row[1]) )
with open( output_file, 'w' ) as outfile:
writer = csv.writer(outfile, delimiter='\t')
for key, value in d.items():
if len(value) == 1:
avg_diff = None # or 0 -- this indicates there was only 1 purchase
else:
# This requires your dates to be sorted, ascending, but that just takes
# wrapping 'value' in 'sorted' if it isn't sorted yet
avg_diff = mean([v[i] - v[i-1] for i, v in enumerate(value) if i])
writer.writerow( [key, avg_diff] )
当前错误:
TypeError Traceback (most recent call last)
<ipython-input-2-1e819db94549> in <module>()
22 # This requires your dates to be sorted, ascending, but that just takes
23 # wrapping 'value' in 'sorted' if it isn't sorted yet
---> 24 avg_diff = mean([v[i] - v[i-1] for i, v in enumerate(value) if i])
25 writer.writerow( [key, avg_diff] )
<ipython-input-2-1e819db94549> in <listcomp>(.0)
22 # This requires your dates to be sorted, ascending, but that just takes
23 # wrapping 'value' in 'sorted' if it isn't sorted yet
---> 24 avg_diff = mean([v[i] - v[i-1] for i, v in enumerate(value) if i])
25 writer.writerow( [key, avg_diff] )
TypeError: 'float' object is not subscriptable
这就是我现在遇到的更新代码。
答案 0 :(得分:1)
看起来你只需要一个简单的函数来计算平均值。
def avg(iterable):
count = 0
running_sum = 0
for item in iterable:
running_sum += item
count += 1
return running_sum / float(count)
现在你只需要这些值。如果我了解您的意图,您希望i
处的值减去i - 1
处的值......
itertools
有一个几乎可以做到这一点的方法,但是如果你想要的话,如果没有itertools你自己编写应该不难:
from itertools import tee, izip
def pairwise(iterable):
"s -> (s0,s1), (s1,s2), (s2, s3), ..."
a, b = tee(iterable)
next(b, None)
return izip(a, b)
我们没有区别,但在生成器中我们可以很容易地将其传递给我们的avg
函数(因为我们谨慎地使avg
与任何<一起工作/ em> iterable,而不仅仅是序列):
average = avg(n - p for p, n in pairwise(values))
答案 1 :(得分:1)
而不是max(value) - min(value)
,似乎(如果我理解正确的话)你可以写:
def mean(x):
return float(sum(x))/len(x)
...
for key, value in d.items():
if len(value) == 1:
avg_diff = None # or 0 -- this indicates there was only 1 purchase
else:
# This requires your dates to be sorted, ascending
sv = sorted(value)
avg_diff = mean([sv[i] - sv[i-1] for i in range(len(sv)) if i])
writer.writerow( [key, avg_diff] )
这将为您提供每个人的平均日期长度。
我认为None
对于单一购买者来说更好,因为在同一天购买两件东西时0是有效值。
答案 2 :(得分:0)
正如您在其他帖子中所提到的,此代码应该修复它。它将获取每个名称的所有日期,并将其与该名称相关联作为子列表。然后,它对子列表进行排序以按顺序获取日期,最后在最大和最小日期之间写入AVERAGE。平均最好用它自己的功能完成,但我保持简单(呃)。
import csv
index = []
input_file = 'input.csv'
output_file = 'output.csv'
def find_name(index, name):
""" Binary search to see if the name exist in the index, yet. """
if len(index) == 0:
return -1
start = 0
limit = len(index) - 1
while start <= limit:
guess = (start + limit) / 2
if index[guess][0] == name:
return guess
elif index[guess][0] < name:
start = guess + 1
else:
limit = guess - 1
return -1
def add_to_index(index, name, date):
""" sorts the existing index. Sends the variables to "find_name".
if the name is round, returns the address of the name in the list.
if it's not found, it returns a -1. """
index.sort()
name_index = find_name(index, name)
if name_index == -1:
index.append([name, [date]])
else:
index[name_index][1].append(date)
""" Read throught each row of the input file, skipping the header.
send each row to the "add_to_index" function."""
with open( input_file, 'rb' ) as infile:
reader = csv.reader(infile, delimiter='\t')
next(reader, None) # skip the header
for row in reader:
add_to_index(index, row[0], row[1])
""" Write the output from the index back to the output file, only
showing writing the earliest date for each user. """
with open( output_file, 'wb' ) as outfile:
writer = csv.writer(outfile, delimiter='\t')
for e in index:
print e
name = e[0]
if len(e[1]) == 1: #if only one dates, answer is 0
average_days = 0
elif len(e[1]) == 2: #if only two dates, answer is the diff
e[1].sort()
average_days = int(e[1][-1]) - int(e[1][0])
else: #if more than two dates, average.
e[1].sort()
total = 0
total_dates = len(e[1])
print total_dates
count = len(e[1]) - 1
while count > 0:
total += int(e[1][count]) - int(e[1][count - 1])
print total
count -= 1
average_days = total / total_dates
writer.writerow([name, average_days])
我创建了一个新的输入文件来获取两个以上的日期。它看起来像这样:
Bill to Name Date
James Doe 41929
Jane Doe 41852
Adam Adamson 42244
Adam Adamson 41529
Adam Adamson 41852
输出如下:
Adam Adamson 238
James Doe 0
Jane Doe 0