Python - 将人类可读的文件大小分解为字节

时间:2017-03-17 19:31:50

标签: python

example_strings = ["10.43 KB", "11 GB", "343.1 MB"]

我想将所有这些字符串转换为字节。到目前为止,我想出了这个:

def parseSize(size):
    if size.endswith(" B"):
        size = int(size.rstrip(" B"))
    elif size.endswith(" KB"):
        size = float(size.rstrip(" KB")) * 1000
    elif size.endswith(" MB"):
        size = float(size.rstrip(" MB")) * 1000000
    elif size.endswith(" GB"):
        size = float(size.rstrip(" GB")) * 10000000000
    elif size.endswith(" TB"):
        size = float(size.rstrip(" TB")) * 10000000000000
    return int(size)

但我不喜欢它,而且我认为它不起作用。有没有可以帮助我的python模块?我只能找到做相反事情的模块。

5 个答案:

答案 0 :(得分:7)

这是一个稍微漂亮的版本。可能没有这个模块,只需定义内联函数。它非常小而且可读。

units = {"B": 1, "KB": 10**3, "MB": 10**6, "GB": 10**9, "TB": 10**12}

def parseSize(size):
    number, unit = [string.strip() for string in size.split()]
    return int(float(number)*units[unit])


example_strings = ["10.43 KB", "11 GB", "343.1 MB"]

for example_string in example_strings:
    print(parseSize(example_string))

10430
11000000000
343100000

答案 1 :(得分:4)

与Google出现的所有其他功能相比,我更喜欢Denziloe's answer,但

  • 数字和单位之间的必需空格
  • 不处理小写字母
  • 假设kb是1000而不是1024,以此类推。(对mlissner表示感谢,是想在几年前指出这一点。也许我们的假设太老了,但我看不到大多数软件也赶上了新的假设。)

所以我将其调整为:

import re

# based on https://stackoverflow.com/a/42865957/2002471
units = {"B": 1, "KB": 2**10, "MB": 2**20, "GB": 2**30, "TB": 2**40}

def parse_size(size):
    size = size.upper()
    #print("parsing size ", size)
    if not re.match(r' ', size):
        size = re.sub(r'([KMGT]?B)', r' \1', size)
    number, unit = [string.strip() for string in size.split()]
    return int(float(number)*units[unit])

example_strings = ["1024b", "10.43 KB", "11 GB", "343.1 MB", "10.43KB", "11GB", "343.1MB", "10.43 kb", "11 gb", "343.1 mb", "10.43kb", "11gb", "343.1mb"]

for example_string in example_strings:
        print(example_string, parse_size(example_string))

我们可以通过检查输出来验证:

$ python humansize.py 
('1024b', 1024)
('10.43 KB', 10680)
('11 GB', 11811160064)
('343.1 MB', 359766425)
('10.43KB', 10680)
('11GB', 11811160064)
('343.1MB', 359766425)
('10.43 kb', 10680)
('11 gb', 11811160064)
('343.1 mb', 359766425)
('10.43kb', 10680)
('11gb', 11811160064)
('343.1mb', 359766425)

答案 2 :(得分:3)

要回答OP的问题,似乎确实有一个模块,humanfriendly

pip install humanfriendly

然后

>>> import humanfriendly
>>> user_input = raw_input("Enter a readable file size: ")
Enter a readable file size: 16G
>>> num_bytes = humanfriendly.parse_size(user_input)
>>> print num_bytes
16000000000
>>> print "You entered:", humanfriendly.format_size(num_bytes)
You entered: 16 GB
>>> print "You entered:", humanfriendly.format_size(num_bytes, binary=True)
You entered: 14.9 GiB

答案 3 :(得分:0)

基于chicks的答案,只使用正则表达式解析大小并接受整数大小。

UNITS = {None: 1, "B": 1, "KB": 2 ** 10, "MB": 2 ** 20, "GB": 2 ** 30, "TB": 2 ** 40}


def parse_human_size(size):
    """
    >>> examples = [12345, "123214", "1024b", "10.43 KB", "11 GB", "343.1 MB", "10.43KB", "11GB", "343.1MB", "10.43 kb"]
    >>> for s in examples:
        print('[', s, ']', parse_human_size(s))
    """
    if isinstance(size, int):
        return size
    m = re.match(r'^(\d+(?:\.\d+)?)\s*([KMGT]?B)?$', size.upper())
    if m:
        number, unit = m.groups()
        return int(float(number) * UNITS[unit])
    raise ValueError("Invalid human size")

答案 4 :(得分:-1)

代码搜索包含字符串的度量单位。一旦找到。用另一个正则表达式提取数字。曾经做过这两件事。将值计算为字节。如果未指定该值,它将尝试将其视为Bytes,但如果无法进行转换,则该函数将返回0。

def calculate(data):

    convertion={"G":1073741824,"M":1048576,"K":1024,"B":1}
    result=re.findall(r'G|M|K|B',data,re.IGNORECASE)
    if len(result)>=1:
        number=re.findall(r'[-+]?\d*\.\d+|\d+', data)
        number=float(number[0])
        return int(number*convertion[result[0].upper()])
    else:
      number=re.findall(r'[-+]?\d*\.\d+|\d+', data)
      if len(number)>=1:
        number=float(number[0])
        return int(number*convertion["B"])
      else:
          return 0