如何从文本(.txt)文件计算标准差?

时间:2019-01-07 12:46:35

标签: python

Category;currency;sellerRating;Duration;endDay;ClosePrice;OpenPrice;Competitive?
Music/Movie/Game;US;3249;5;Mon;0,01;0,01;No
Music/Movie/Game;US;3249;5;Mon;0,01;0,01;No
Music/Movie/Game;US;3249;5;Mon;0,01;0,01;No
Music/Movie/Game;US;3249;5;Mon;0,01;0,01;No
Music/Movie/Game;US;3249;5;Mon;0,01;0,01;No
Music/Movie/Game;US;3249;5;Mon;0,01;0,01;No
Music/Movie/Game;US;3249;5;Mon;0,01;0,01;No
Automotive;US;3115;7;Tue;0,01;0,01;No
Automotive;US;3115;7;Tue;0,01;0,01;No
Automotive;US;3115;7;Tue;0,01;0,01;Yes

实际文件中没有任何空白,否则将显示错误。我想从每个类别计算标准除法。

我试图用这个: statistics.stdev(),但是不起作用。 谁能帮助我,当您拥有遮阳篷时,可以解释一下,以便我学习。

from csv import DictReader

from collections import defaultdict
from statistics import median

from locale import setlocale
from locale import LC_ALL
from locale import atof

setlocale(LC_ALL, 'Dutch_Netherlands.1252')

median_names = 'sellerRating', 'Duration', 'ClosePrice', 'OpenPrice'
print ("Mediaan : ")
data = defaultdict(list)
with open('bijlage.txt') as f:
    csvreader = DictReader(f, delimiter=';')
    for dic in csvreader:
        for header, value in dic.items():
            data[header].append(value)

for median_name in median_names:
    med = median(map(atof, data[median_name]))
    print('{:<13} {:>10}'.format(median_name, med))

from collections import defaultdict
import csv
import locale
import statistics
from pprint import pprint, pformat

import locale

locale.setlocale(locale.LC_ALL, 'Dutch_Netherlands.1252')

avg_names = 'sellerRating', 'Duration', 'ClosePrice', 'OpenPrice'
averages = {avg_name: 0 for avg_name in avg_names}

seller_ratings = defaultdict(list)

num_values = 0
with open('bijlage.txt', newline='') as bestand:
     csvreader = csv.DictReader(bestand, delimiter=';')
     for row in csvreader:
        num_values += 1
        for avg_name in avg_names:
             averages[avg_name] += locale.atof(row[avg_name])

seller_ratings[row['Category']].append(locale.atof(row['sellerRating']))

for avg_name, total in averages.items():
    averages[avg_name] = total / num_values

print()
print('Averages:')
for avg_name in avg_names:
    rounded = locale.format_string('%.2f', round(averages[avg_name], 2),
                               grouping=True)
    print('  {:<13} {:>10}'.format(avg_name, rounded))

modes = {}
for category, values in seller_ratings.items():
    try:
        modes[category] = statistics.mode(values)
    except statistics.StatisticsError:
        modes[category] = None  # No unique mode.

print()
print('Modes:')
for category, mode in modes.items():
    if mode is None:
         print('  {:<20} {:>10}'.format(category, '-'))
    else:
        rounded = locale.format_string('%.2f', round(mode, 2), grouping=True)
        print('  {:<20} {:>10}'.format(category, rounded))

3 个答案:

答案 0 :(得分:2)

In your previous questions, it was already described how to get the average, median and stuff like that: https://stackoverflow.com/a/54021108/8181134
Using the same, but than the .std() function, you can get the standard deviation:

import pandas as pd
df = pd.read_csv('bijlage.csv', delimiter=';', decimal=',')  # 'bijlage.txt' in your case
sellerRating_std = df['sellerRating'].std()
print('Seller rating standard deviation: {}'.format(sellerRating_std)

答案 1 :(得分:0)

First of all, please note that median_names = 'sellerRating', 'Duration', 'ClosePrice', 'OpenPrice' does not do what you probably expect here.

What you need is to assign a tuple over which you iterate later, like this: median_names = ('sellerRating', 'Duration', 'ClosePrice', 'OpenPrice')

having done that, you can compute the standard deviation just like you've computed the median:

from csv import DictReader

from collections import defaultdict
from statistics import median

from locale import setlocale
from locale import LC_ALL
from locale import atof

setlocale(LC_ALL, 'Dutch_Netherlands.1252')

stddev_names = ('sellerRating', 'Duration', 'ClosePrice', 'OpenPrice')
print ("std dev : ")
data = defaultdict(list)
with open('bijlage.txt') as f:
    csvreader = DictReader(f, delimiter=';')
    for dic in csvreader:
        for header, value in dic.items():
            data[header].append(value)

for name in stddev_name:
    stddev_val = stdev(map(atof, data[name]))
    print('{:<13} {:>10}'.format(name, stddev_val))

答案 2 :(得分:0)

要使用statistics模块,您要走的第一条路(用于中位数):

setlocale(LC_ALL, 'Dutch_Netherlands.1252')

median_names = 'sellerRating', 'Duration', 'ClosePrice', 'OpenPrice'
print ("Mediaan : ")
data = defaultdict(list)
with open('bijlage.txt') as f:
    csvreader = DictReader(f, delimiter=';')
    for dic in csvreader:
        for header, value in dic.items():
            data[header].append(value)

for median_name in median_names:
    med = median(map(atof, data[median_name]))
    print('{:<13} {:>10}'.format(median_name, med))

这部分没有更改,您只需要在其后立即处理stdev,因为您可以使用相同的data列表字典:

from statistics import stdev
print("\nStd Dev (sample)")
for median_name in median_names:
    std= stdev(map(atof, data[median_name]))
    print('{:<13} {:>10}'.format(median_name, std))