Question

我正在打印出每个特定十年中每个州的谋杀频率。但是，我只想打印状态，十年及其受害人数。我现在所拥有的是，它以相同的频率打印出所有列。如何更改它，使我只有3列，即“状态”，“十年”和“受害者计数”？

我目前正在使用groupby函数对状态和十年进行分组，并将其设置为等于count的变量。

  xl = pd.ExcelFile('Wyoming.xlsx')
  df = xl.parse('Sheet1')

  df['Decade'] = (df['Year'] // 10) * 10

  counts = df.groupby(['State', 'Decade']).count()

  print(counts)

结果是以相同的频率打印出文件中的所有列，而我只需要3列：State Decade Victim Count

Sample Text File

Answer 1

选择所需的列：

counts = df.loc[:,['State', 'Decade','Vistim Count']].groupby(['State', 'Decade']).count()

或

print(count.loc[:,['State', 'Decade','Vistim Count']])

Answer 2

您应该reset_index的groupby对象，然后从新的数据框中选择列。

类似

xl = pd.ExcelFile('Wyoming.xlsx')
df = xl.parse('Sheet1')

df['Decade'] = (df['Year'] // 10) * 10

counts = df.groupby(['State', 'Decade']).count()
counts = counts.reset_index()[['State', 'Decade','Vistim Count']]
print(counts)

如何获取已分组的df的特定属性

2 个答案: