我在Jupyter Notebook中展示了一个数据框。数据框的初始数据类型是float。我想提出第1行和第1行。 3个打印的表作为整数和行2& 4为百分比。我怎么做? (我花了很多时间寻找一个没有成功的解决方案)
这是我正在使用的代码:
#Creating the table
clms = sales.columns
indx = ['# of Poeple','% of Poeple','# Purchased per Activity','% Purchased per Activity']
basic_stats = pd.DataFrame(data=np.nan,index=indx,columns=clms)
basic_stats.head()
#Calculating the # of people who took part in each activity
for clm in sales.columns:
basic_stats.iloc[0][clm] = int(round(sales[sales[clm]>0][clm].count(),0))
#Calculating the % of people who took part in each activity from the total email list
for clm in sales.columns:
basic_stats.iloc[1][clm] = round((basic_stats.iloc[0][clm] / sales['Sales'].count())*100,2)
#Calculating the # of people who took part in each activity AND that bought the product
for clm in sales.columns:
basic_stats.iloc[2][clm] = int(round(sales[(sales[clm] >0) & (sales['Sales']>0)][clm].count()))
#Calculating the % of people who took part in each activity AND that bought the product
for clm in sales.columns:
basic_stats.iloc[3][clm] = round((basic_stats.iloc[2][clm] / basic_stats.iloc[0][clm])*100,2)
#Present the table
basic_stats
这是印刷的表格: Output table of 'basic_stats' data frame in Jupyter Notebook
答案 0 :(得分:0)
您已经将整数分配给第1行和第3行的单元格。这些整数打印为浮点数的原因是所有列都具有数据类型float64
。这是由您最初创建Dataframe的方式引起的。您可以通过打印.dtypes
属性来查看数据类型:
basic_stats = pd.DataFrame(data=np.nan,index=indx,columns=clms)
print(basic_stats.dtypes)
# Prints:
# column1 float64
# column2 float64
# ...
# dtype: object
如果未在Data的构造函数中提供data
关键字参数
frame,每个单元格的数据类型为object
,可以是任何对象:
basic_stats = pd.DataFrame(index=indx,columns=clms)
print(basic_stats.dtypes)
# Prints:
# column1 object
# column2 object
# ...
# dtype: object
当单元格的数据类型为object
时,内容将使用内置方法进行格式化,从而使整数格式正确。
为了显示百分比,您可以使用以您希望的方式打印浮点数的自定义类:
class PercentRepr(object):
"""Represents a floating point number as percent"""
def __init__(self, float_value):
self.value = float_value
def __str__(self):
return "{:.2f}%".format(self.value*100)
然后只使用此类作为第1行和第3行的值:
#Creating the table
clms = sales.columns
indx = ['# of Poeple','% of Poeple','# Purchased per Activity','% Purchased per Activity']
basic_stats = pd.DataFrame(index=indx,columns=clms)
basic_stats.head()
#Calculating the # of people who took part in each activity
for clm in sales.columns:
basic_stats.iloc[0][clm] = int(round(sales[sales[clm]>0][clm].count(),0))
#Calculating the % of people who took part in each activity from the total email list
for clm in sales.columns:
basic_stats.iloc[1][clm] = PercentRepr(basic_stats.iloc[0][clm] / sales['Sales'].count())
#Calculating the # of people who took part in each activity AND that bought the product
for clm in sales.columns:
basic_stats.iloc[2][clm] = int(round(sales[(sales[clm] >0) & (sales['Sales']>0)][clm].count()))
#Calculating the % of people who took part in each activity AND that bought the product
for clm in sales.columns:
basic_stats.iloc[3][clm] = PercentRepr(basic_stats.iloc[2][clm] / basic_stats.iloc[0][clm])
#Present the table
basic_stats
注意:这实际上会更改数据框中的数据!如果要使用第1行和第3行的数据进行进一步处理,您应该知道这些行不再包含浮动对象。
答案 1 :(得分:0)
这是一种方式,一种黑客,但如果只是为了漂亮的打印,它会起作用。
df = pd.DataFrame(np.random.random(20).reshape(4,5))
# first and third rows display as integers
df.loc[0,] = df.loc[0,]*100
df.loc[2,] = df.loc[2,]*100
df.loc[0,:] = df.loc[0,:].astype(int).astype(str)
df.loc[2,:] = df.loc[2,:].astype(int).astype(str)
# second and fourth rows display as percents (with 2 decimals)
df.loc[1,:] = np.round(df.loc[1,:].values.astype(float),4).astype(float)*100
df.loc[3,:] = np.round(df.loc[3,:].values.astype(float),4).astype(float)*100