注意:我是Python的初学者,所以请多多包涵!
编辑:我已修复错误,但下面的问题需要我的帮助!
我的问题是:
1)如果我想将最小值和最常用的单词/数字放到一张桌子上,如何索引到最小值/最常用的单词/数字并将其提取并放在我的正确位置上桌子?
说明
下面的代码应该使用功能
来转置给定的嵌套列表A def rows2cols(A):
,
然后遍历该列表,对于每一列,我都检查一下是否具有数值或不使用
def isnumericlist(A):
。
如果列表中确实包含数值,则将字符串转换为浮点数,然后从该列表中找到最小值和最常用的单词/数字。
代码如下:
A = [['OrderDate', 'Region', 'Rep', 'Item', 'Units', 'Unit Price'],['4-Jul-2014', 'East', 'Richard', 'Pen Set', '62', '4.99'], ['12-Jul-2014', 'East', 'Nick', 'Binder', '29', '1.99'], ['21-Jul-2014', 'Central', 'Morgan', 'Pen Set', '55', '12.49'], ['29-Jul-2014', 'East', 'Susan', 'Binder', '81', '19.99'],['7-Aug-2014', 'Central', 'Matthew', 'Pen Set', '42', '23.95'], ['15-Aug-2014', 'East', 'Richard', 'Pencil', '35', '4.99'], ['24-Aug-2014', 'West', 'James', 'Desk', '3', '275']]
minVal = []
maxVal = []
hist = []
average = []
stanDev = []
headers = A[0] #this sets the variable "headers" as the first row
rows = A[1:] #skips the first row
def rows2cols(A):
if len(A) == 0:
return [] #this covers the base case of having an empty csv file
res = [[] for x in headers] # would create a list of empty lists
for line in A:
for col in range(len(line)):
res[col].append(line[col])
return res
def convertstringtofloats(A):
res = []
for x in A:
res.append(float(x))
return res
def isnumericlist(A):
for x in A:
try:
numeric = float(x)
except:
return False
return True
def getMin(A):
res = B[0] #first column AFTER you transpose the nested list
for x in A:
if x < res:
res = x
return res
def most_common(A):
counts = {}
for x in A:
counts[tuple(x)] = counts.get(tuple(x), 0) + 1
max = -1
maxKey = ""
for key,value in counts.items():
if max < value:
max = value
maxKey = key
return maxKey
def notnumeric(A):
return "n/a"
cols = rows2cols(rows)
for col in range(len(headers)):
if isnumericlist(cols[col]):
B = convertstringtofloats(cols[col])
minVal.append(getMin(B))
maxVal.append(getMax(B))
average.append(getAvg(B))
stanDev.append(getSD(B))
else:
notnumeric(col)
mode.append(most_common(cols[col]))
tablevalues = [minVal, maxVal, average, stanDev, mode]
下面是我用于生成表格的代码,以及有关我希望结果如何的示例表格!
def print_table(table):
longest_cols = [
(max([len(str(row[i])) for row in table]) + 0) for i in range(len(table[0]))
]
row_format = "|".join([" {:>" + str(longest_col) + "} " for longest_col in longest_cols])
first = True
for row in table:
print(row_format.format(*row))
if first:
print((sum(longest_cols) + (len(table[0]) - 0) * 3) * "-")
first = False
table = [
["Columns:", "Min", "Max", "Avg", "Std. Dev.", "Most Common Word"],
["OrderDate", "n/a", "n/a", "n/a", "n/a", "John"],
["Region", 3.3, 6.29, 4.888, 1.333, 4.911],
["Rep", 1.3, 3.2, 1.888, 0.333, 1.9],
["Item", 1.3, 3.2, 1.888, 0.333, 1.9],
["Units","n/a", "n/a", "n/a", "n/a", "John"],
["Unit Price","n/a", "n/a", "n/a", "n/a", "John"]
]
print_table(table)
答案 0 :(得分:0)
熊猫可能会对您有所帮助。 df.describe(include='all')
将为您提供所需的桌子。您只需要使用熊猫读取数据A并根据需要更改每列中的数据类型。 top
是相应列中最常见的单词,而freq
是此特定单词出现的时间。您甚至可以将此表另存为df2 = df.describe(include='all')
import pandas as pd
A = [['OrderDate', 'Region', 'Rep', 'Item', 'Units', 'Unit Price'],
['4-Jul-2014', 'East', 'Richard', 'Pen Set', '62', '4.99'],
['12-Jul-2014', 'East', 'Nick', 'Binder', '29', '1.99'],
['21-Jul-2014', 'Central', 'Morgan', 'Pen Set', '55', '12.49'],
['29-Jul-2014', 'East', 'Susan', 'Binder', '81', '19.99'],
['7-Aug-2014', 'Central', 'Matthew', 'Pen Set', '42', '23.95'],
['15-Aug-2014', 'East', 'Richard', 'Pencil', '35', '4.99'],
['24-Aug-2014', 'West', 'James', 'Desk', '3', '275']]
df = pd.DataFrame(A[1:],columns=A[0])
print(df)
OrderDate Region Rep Item Units Unit Price
0 04-Jul-2014 East Richard Pen Set 62 4.99
1 12-Jul-2014 East Nick Binder 29 1.99
2 21-Jul-2014 Central Morgan Pen Set 55 12.49
3 29-Jul-2014 East Susan Binder 81 19.99
4 07-Aug-2014 Central Matthew Pen Set 42 23.95
5 15-Aug-2014 East Richard Pencil 35 4.99
6 24-Aug-2014 West James Desk 3 275.00
df = df.astype(dtype={'OrderDate':'str', 'Region':'str',
'Rep':'str', 'Item':'str', 'Units':'int', 'Unit Price':'float'})
df['OrderDate'] = df.OrderDate.apply(
lambda x: pd.to_datetime(x).strftime('%d-%b-%Y'))
print(df.dtypes)
OrderDate object
Region object
Rep object
Item object
Units int32
Unit Price float64
dtype: object
print(df.describe(include='all'))
OrderDate Region Rep Item Units Unit Price
count 7 7 7 7 7.000000 7.000000
unique 7 3 6 4 NaN NaN
top 24-Aug-2014 East Richard Pen Set NaN NaN
freq 1 4 2 3 NaN NaN
mean NaN NaN NaN NaN 43.857143 49.057143
std NaN NaN NaN NaN 25.182193 99.968112
min NaN NaN NaN NaN 3.000000 1.990000
25% NaN NaN NaN NaN 32.000000 4.990000
50% NaN NaN NaN NaN 42.000000 12.490000
75% NaN NaN NaN NaN 58.500000 21.970000
max NaN NaN NaN NaN 81.000000 275.000000