Question

我试图找到整个数据帧的中间流量。第一部分是仅选择数据框中的某些项目。

这有两个问题，它包括数据框中没有状态的部分＆＃39;。此外，中位数不是单一值，而是基于行。如何获得数据框中所有数据的总体中位数？

Answer 1

两个选项：

1）熊猫选项：

df.stack().median()

2）numpy选项：

np.median(df.values)

Answer 2

由于某些空格，您粘贴的DataFrame略显凌乱。但是您想要melt数据帧，然后在新的融合数据帧上使用median()：

df2 = pd.melt(df, id_vars =['U.S.'])
print(df2['value'].median())

您的Dataframe可能略有不同，但概念是相同的。检查我留下的评论，以了解pd.melt()，尤其是value_vars和id_vars参数。

以下是我如何清洁和获得正确答案的非常详细的方法：

# reading in on clipboard
df = pd.read_clipboard()

# printing it out to see and also the column names
print(df)
print(df.columns)

# melting the DF and then printing the result
df2 = pd.melt(df, id_vars =['U.S.'])
print(df2)

# Creating a new DF so that no nulls are in there for ease of code readability
# using .copy() to avoid the Pandas warning about working on top of a copy
df3 = df2.dropna().copy()

# there were some funky values in the 'value' column. Just getting rid of those
df3.loc[df3.value.isin(['Columbia', 'of']), 'value'] = 99

# printing out the cleaned version and getting the median
print(df3)
print(df3['value'].median())

找到整个大熊猫数据框的中位数

2 个答案: