Question

我正在处理人口普查数据，这些列提供了年龄组，而不是离散的年龄。为了按代处理数据（千禧一代，婴儿潮一代等），我需要将其转换为单年。（即5至9岁的一列，而不是5岁的一列，六岁的一列，等等。）

在我试图确定趋势时，可以将每个年龄段平均划分，以获取每年的离散值。

我想遍历每列，并创建5列新列，每列的原始列值除以5（因为每组有5年。

我尝试将新列名和我想均分的列的索引都设置为零。

然后，我编写了一个for循环以遍历数据帧中的每一列。在其中，我有一个嵌套的for循环，因此它对每一列执行5次操作。然后，我每次在内部for循环中为新列名增加n值，并在外部for循环中为被划分的列索引增加s值。

df = pd.DataFrame([[6.8, 6.5], [5.2, 8.9], [6.4, 7.6]], columns= ['Under 5 years', '5 to 9 years'])

## Set up variables. 'n' is for the new column name. 's' is the index of the column to be divided.

n= 0 
s = 0

## For loop to iterate through each column in the dataframe and perform the operation on each column 5 times before moving onto the next column:

for s in df.iteritems():
    for i in range(5):
        df['{}'.format(n)].iloc = df[s].iloc/5
        n+=1 
     s+=1

我不断收到类型错误：5年以下，dtype：float64）'是无效密钥

我不知道如何解决此错误，或者其余代码是否还能正常工作。

所需的输出将是以下数据帧：

df = pd.DataFrame([[6.8, 6.5, 1.36, 1.36, 1.36, 1.36, 1.36, 1.3, 1.3, 1.3, 1.3, 1.3], [5.2, 8.9, 1.04, 1.04, 1.04, 1.04, 1.04, 1.78, 1.78, 1.78, 1.78, 1.78], [6.4, 7.6, 1.28, 1.28, 1.28, 1.28, 1.28, 1.52, 1.52, 1.52, 1.52, 1.52]], columns=['Under 5 years', '5 to 9 years', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9'])

Answer 1

此行df['{}'.format(n)].iloc = df[s].iloc/5是错误的：

s不是列名，而是一对(column_name, column)，因此不能用于索引数据框。
iloc也是错误的，反正毫无用处
更改正在迭代的内容极其很危险，因此在for s in df.iteritems():循环中，切勿将列添加到df

根据预期结果，您想要的是：

cols = df.columns.tolist()
for j, s in enumerate(cols):  # ok cols is a plain list
    for i in range(5):
        df[str(i + 5*j)] = df[s]/5

Answer 2

为什么不只使用cut

In [12]: df = pd.DataFrame({'age':[5,8,10,25,33,22,38]})

In [13]: df
Out[13]:
   age
0    5
1    8
2   10
3   25
4   33
5   22
6   38

In [14]: df['age_group'] = pd.cut(df.age,bins=[5,10,15,20,25,30,35,40])

In [15]: df
Out[15]:
   age age_group
0    5       NaN
1    8   (5, 10]
2   10   (5, 10]
3   25  (20, 25]
4   33  (30, 35]
5   22  (20, 25]
6   38  (35, 40]

In [16]: df.groupby(['age_group']).count()
Out[16]:
           age
age_group
(5, 10]      2
(10, 15]     0
(15, 20]     0
(20, 25]     2
(25, 30]     0
(30, 35]     1
(35, 40]     1

遍历列并划分列值，并在列中创建带有结果的新列

2 个答案: