将带有M和K的字符串对象值转换为百万和一千

时间:2019-12-28 03:15:22

标签: python python-3.x object jupyter-notebook

我是python / jupyter笔记本的新手。我正在练习。我的列值出现问题,千位数为“ K”,千百万位数为“ M”。

需要有关如何执行以下操作的帮助:

  1. 如何使用'K'一千个值来制作值
  2. 如何用'M'百万个值(整数和十进制)进行赋值

注意:我目前正在将jupyter笔记本用于熊猫和numpy导入

我希望输出为:

450K至450000 9.5M至9500000 1200万至1200万

这是“值”下数据文件列的链接

https://drive.google.com/open?id=1BOUVYiY6iRLbUdimCg7rgwtecfU6QAwS

请参阅附件:

enter image description here

2 个答案:

答案 0 :(得分:1)

此答案将名为y的列表转换为名为numbers的列表,将K替换为千,将M替换为Million,以及将列表strings替换为k之前的数字与1000和m之前的数字为1000000。

y = data.Value.unique()

strings = []
numbers = []

for number in y:
    if number[-1:] == 'K':  # Check if the last digit is K
        strings.append(number[:-1] + " Thousand")  # Append a Thousand after removing the last char
        numbers.append(float(number[:-1]) * 1000)  # Remove the last digit with [:-1], and convert to int and multiply by 1000
    elif number[-1:] == 'M':  # Check if the last digit is M
        strings.append(number[:-1] + " Million")  # Append a Million after removing the last char
        numbers.append(float(number[:-1]) * 1000000)  # Remove the last digit with [:-1], and convert to int and multiply by 1000000
    else:  # just in case data doesnt have an M or K
        strings.append(number)
        numbers.append(int(number))

使用 print(numbers) 打印数字的数值

答案 1 :(得分:0)

如果您想转换为数字,请改用本帖子,{How can I consistently convert strings like "3.71B" and "4M" to numbers in Python?

import numpy as np

def text_to_num(text, bad_data_val = 0):
    d = {
        'K': 1000,
        'M': 1000000,
        'B': 1000000000
    }
    if not isinstance(text, str):
        # Non-strings are bad are missing data in poster's submission
        return bad_data_val

    elif text[-1] in d:
        # separate out the K, M, or B
        num, magnitude = text[:-1], text[-1]
        return int(float(num) * d[magnitude])
    else:
        return float(text)

处理海报的FIFA数据集

统计数据会有所不同,具体取决于我们将坏数据或缺失数据的默认值设置为什么。

当我们处理 Wage 列时,这一点更加明显,该列的数据比 Value 列少得多。

示例

print("Starting Values\n", df['Wage'].head())
for default_val in [0, None]:  # Try 0 and None for missing data fields
    print('\nUsing Default Value {}'.format(default_val))
    df['Result'] = df.apply(lambda row: text_to_num(row['Wage'], default_val), axis=1)
    print("Converted values:\n", df['Result'].head())
    print("\nStats {}".format(default_val))
    print(df['Result'].dropna().describe())  # Get stats dropping missing data (i.e. None values)
    print('-'*20)

输出

注意:

(1)如果将0用作默认值,则会降低统计信息(即最小值为零且均值较低)

(2)当将None用作defaultValue时,我们将忽略该值并获得更好的统计信息

Starting Values
 0    565K
1    565K
2    280K
3    510K
4    230K
Name: Wage, dtype: object

Using Default Value 0
Converted values:
 0    565000
1    565000
2    280000
3    510000
4    230000
Name: Result, dtype: int64

Stats 0
count     17981.000000
mean      11546.966242
std       23080.000139
min           0.000000
25%        2000.000000
50%        4000.000000
75%       12000.000000
max      565000.000000
Name: Result, dtype: float64
--------------------

Using Default Value None
Converted values:
 0    565000.0
1    565000.0
2    280000.0
3    510000.0
4    230000.0
Name: Result, dtype: float64

Stats None
count     17733.000000
mean      11708.453166
std       23200.122784
min        1000.000000
25%        2000.000000
50%        4000.000000
75%       12000.000000
max      565000.000000
Name: Result, dtype: float64
--------------------