Question

注意：我是Python的初学者，所以请多多包涵！

编辑：我已修复错误，但下面的问题需要我的帮助！

我的问题是：

1）如果我想将最小值和最常用的单词/数字放到一张桌子上，如何索引到最小值/最常用的单词/数字并将其提取并放在我的正确位置上桌子？

说明

下面的代码应该使用功能

来转置给定的嵌套列表A

def rows2cols(A):，

然后遍历该列表，对于每一列，我都检查一下是否具有数值或不使用

def isnumericlist(A):。

如果列表中确实包含数值，则将字符串转换为浮点数，然后从该列表中找到最小值和最常用的单词/数字。

代码如下：

A = [['OrderDate', 'Region', 'Rep', 'Item', 'Units', 'Unit Price'],['4-Jul-2014', 'East', 'Richard', 'Pen Set', '62', '4.99'], ['12-Jul-2014', 'East', 'Nick', 'Binder', '29', '1.99'], ['21-Jul-2014', 'Central', 'Morgan', 'Pen Set', '55', '12.49'], ['29-Jul-2014', 'East', 'Susan', 'Binder', '81', '19.99'],['7-Aug-2014', 'Central', 'Matthew', 'Pen Set', '42', '23.95'], ['15-Aug-2014', 'East', 'Richard', 'Pencil', '35', '4.99'], ['24-Aug-2014', 'West', 'James', 'Desk', '3', '275']]

minVal = []
maxVal = []
hist = []
average = []
stanDev = []
headers = A[0] #this sets the variable "headers" as the first row 
rows = A[1:] #skips the first row

def rows2cols(A):
    if len(A) == 0: 
        return []                      #this covers the base case of having an empty csv file
    res  = [[] for x in headers]       # would create a list of empty lists
    for line in A: 
        for col in range(len(line)): 
            res[col].append(line[col]) 
    return res

def convertstringtofloats(A):
    res = []
    for x in A:
        res.append(float(x))
    return res

def isnumericlist(A):
    for x in A:
        try:
            numeric = float(x) 
        except:
            return False
    return True


def getMin(A):
    res = B[0] #first column AFTER you transpose the nested list
    for x in A:
        if x < res:
            res = x
    return res

def most_common(A):
    counts = {}
    for x in A:
        counts[tuple(x)] = counts.get(tuple(x), 0) + 1 
    max = -1
    maxKey = ""
    for key,value in counts.items():
        if max < value:
            max = value
            maxKey = key
    return maxKey

def notnumeric(A):
    return "n/a"

cols = rows2cols(rows)

for col in range(len(headers)):
    if isnumericlist(cols[col]):
        B = convertstringtofloats(cols[col])
        minVal.append(getMin(B))
        maxVal.append(getMax(B))
        average.append(getAvg(B))
        stanDev.append(getSD(B))
    else:
        notnumeric(col)

    mode.append(most_common(cols[col]))

tablevalues = [minVal, maxVal, average, stanDev, mode]

下面是我用于生成表格的代码，以及有关我希望结果如何的示例表格！

def print_table(table):
    longest_cols = [
        (max([len(str(row[i])) for row in table]) + 0) for i in range(len(table[0]))
    ]
    row_format = "|".join([" {:>" + str(longest_col) + "} " for longest_col in longest_cols])
    first = True
    for row in table:
        print(row_format.format(*row))
        if first:
            print((sum(longest_cols) + (len(table[0]) - 0) * 3) * "-")
            first = False

table = [
    ["Columns:", "Min", "Max", "Avg", "Std. Dev.", "Most Common Word"],
    ["OrderDate", "n/a", "n/a", "n/a", "n/a", "John"],
    ["Region", 3.3, 6.29, 4.888, 1.333, 4.911],
    ["Rep", 1.3, 3.2, 1.888, 0.333, 1.9],
    ["Item", 1.3, 3.2, 1.888, 0.333, 1.9],
    ["Units","n/a", "n/a", "n/a", "n/a", "John"],
    ["Unit Price","n/a", "n/a", "n/a", "n/a", "John"]
]
print_table(table)

Answer 1

熊猫可能会对您有所帮助。 df.describe(include='all')将为您提供所需的桌子。您只需要使用熊猫读取数据A并根据需要更改每列中的数据类型。 top是相应列中最常见的单词，而freq是此特定单词出现的时间。您甚至可以将此表另存为df2 = df.describe(include='all')

作为新数据框

import pandas as pd
A = [['OrderDate', 'Region', 'Rep', 'Item', 'Units', 'Unit Price'],
     ['4-Jul-2014', 'East', 'Richard', 'Pen Set', '62', '4.99'], 
     ['12-Jul-2014', 'East', 'Nick', 'Binder', '29', '1.99'], 
     ['21-Jul-2014', 'Central', 'Morgan', 'Pen Set', '55', '12.49'], 
     ['29-Jul-2014', 'East', 'Susan', 'Binder', '81', '19.99'],
     ['7-Aug-2014', 'Central', 'Matthew', 'Pen Set', '42', '23.95'], 
     ['15-Aug-2014', 'East', 'Richard', 'Pencil', '35', '4.99'], 
     ['24-Aug-2014', 'West', 'James', 'Desk', '3', '275']]

df = pd.DataFrame(A[1:],columns=A[0])

print(df)

OrderDate   Region  Rep Item    Units   Unit Price
0   04-Jul-2014 East    Richard Pen Set 62  4.99
1   12-Jul-2014 East    Nick    Binder  29  1.99
2   21-Jul-2014 Central Morgan  Pen Set 55  12.49
3   29-Jul-2014 East    Susan   Binder  81  19.99
4   07-Aug-2014 Central Matthew Pen Set 42  23.95
5   15-Aug-2014 East    Richard Pencil  35  4.99
6   24-Aug-2014 West    James   Desk    3   275.00

df = df.astype(dtype={'OrderDate':'str', 'Region':'str',
     'Rep':'str', 'Item':'str', 'Units':'int', 'Unit Price':'float'})

df['OrderDate'] = df.OrderDate.apply(
    lambda x: pd.to_datetime(x).strftime('%d-%b-%Y'))

print(df.dtypes)
OrderDate      object
Region         object
Rep            object
Item           object
Units           int32
Unit Price    float64
dtype: object

print(df.describe(include='all'))

OrderDate   Region  Rep Item    Units   Unit Price
count   7   7   7   7   7.000000    7.000000
unique  7   3   6   4   NaN NaN
top     24-Aug-2014 East    Richard Pen Set NaN NaN
freq    1   4   2   3   NaN         NaN
mean    NaN NaN NaN NaN 43.857143   49.057143
std     NaN NaN NaN NaN 25.182193   99.968112
min     NaN NaN NaN NaN 3.000000    1.990000
25%     NaN NaN NaN NaN 32.000000   4.990000
50%     NaN NaN NaN NaN 42.000000   12.490000
75%     NaN NaN NaN NaN 58.500000   21.970000
max     NaN NaN NaN NaN 81.000000   275.000000

Python：如何访问返回值以放入表中？

1 个答案: