我是python / jupyter笔记本的新手。我正在练习。我的列值出现问题,千位数为“ K”,千百万位数为“ M”。
需要有关如何执行以下操作的帮助:
注意:我目前正在将jupyter笔记本用于熊猫和numpy导入
我希望输出为:
450K至450000 9.5M至9500000 1200万至1200万
这是“值”下数据文件列的链接
https://drive.google.com/open?id=1BOUVYiY6iRLbUdimCg7rgwtecfU6QAwS
请参阅附件:
答案 0 :(得分:1)
此答案将名为y
的列表转换为名为numbers
的列表,将K替换为千,将M替换为Million,以及将列表strings
替换为k之前的数字与1000和m之前的数字为1000000。
y = data.Value.unique()
strings = []
numbers = []
for number in y:
if number[-1:] == 'K': # Check if the last digit is K
strings.append(number[:-1] + " Thousand") # Append a Thousand after removing the last char
numbers.append(float(number[:-1]) * 1000) # Remove the last digit with [:-1], and convert to int and multiply by 1000
elif number[-1:] == 'M': # Check if the last digit is M
strings.append(number[:-1] + " Million") # Append a Million after removing the last char
numbers.append(float(number[:-1]) * 1000000) # Remove the last digit with [:-1], and convert to int and multiply by 1000000
else: # just in case data doesnt have an M or K
strings.append(number)
numbers.append(int(number))
使用
print(numbers)
打印数字的数值
答案 1 :(得分:0)
如果您想转换为数字,请改用本帖子,{How can I consistently convert strings like "3.71B" and "4M" to numbers in Python?
import numpy as np
def text_to_num(text, bad_data_val = 0):
d = {
'K': 1000,
'M': 1000000,
'B': 1000000000
}
if not isinstance(text, str):
# Non-strings are bad are missing data in poster's submission
return bad_data_val
elif text[-1] in d:
# separate out the K, M, or B
num, magnitude = text[:-1], text[-1]
return int(float(num) * d[magnitude])
else:
return float(text)
处理海报的FIFA数据集
统计数据会有所不同,具体取决于我们将坏数据或缺失数据的默认值设置为什么。
当我们处理 Wage 列时,这一点更加明显,该列的数据比 Value 列少得多。
示例
print("Starting Values\n", df['Wage'].head())
for default_val in [0, None]: # Try 0 and None for missing data fields
print('\nUsing Default Value {}'.format(default_val))
df['Result'] = df.apply(lambda row: text_to_num(row['Wage'], default_val), axis=1)
print("Converted values:\n", df['Result'].head())
print("\nStats {}".format(default_val))
print(df['Result'].dropna().describe()) # Get stats dropping missing data (i.e. None values)
print('-'*20)
输出
注意:
(1)如果将0用作默认值,则会降低统计信息(即最小值为零且均值较低)
(2)当将None用作defaultValue时,我们将忽略该值并获得更好的统计信息
Starting Values
0 565K
1 565K
2 280K
3 510K
4 230K
Name: Wage, dtype: object
Using Default Value 0
Converted values:
0 565000
1 565000
2 280000
3 510000
4 230000
Name: Result, dtype: int64
Stats 0
count 17981.000000
mean 11546.966242
std 23080.000139
min 0.000000
25% 2000.000000
50% 4000.000000
75% 12000.000000
max 565000.000000
Name: Result, dtype: float64
--------------------
Using Default Value None
Converted values:
0 565000.0
1 565000.0
2 280000.0
3 510000.0
4 230000.0
Name: Result, dtype: float64
Stats None
count 17733.000000
mean 11708.453166
std 23200.122784
min 1000.000000
25% 2000.000000
50% 4000.000000
75% 12000.000000
max 565000.000000
Name: Result, dtype: float64
--------------------