Question

所以我有一个9位ID的df列。没有重复项，每个ID都以不同的数字开头，范围从1-6 - 取决于每个ID开头的数字我想创建一个单独的列，其中“name”表示ID的第一个数字。（例如，以1开头的ID代表缅因州，以2开头的ID代表加利福尼亚......等等）

如果它只有两个条件，那么这是有效的：

df['id_label'] = ['name_1' if name.startswith('1') else 'everything_else' for name in df['col_1']]

我无法弄清楚如何为我需要的东西创建多线条理解，所以我认为这会有效，但它只会在循环的最后一次迭代中创建id_label列（即{ {1}}列仅包含id_label）：

'name_5

我的问题是如何根据多个条件语句从旧列创建新列？

Answer 1

我认为您可以按astype将列转换为str，按dict选择第一个值并上一个map：

df = pd.DataFrame({'col_1':[133,255,36,477,55,63]})
print (df)

d = {'1':'Maine', '2': 'California', '3':'a', '4':'f', '5':'r', '6':'r'}
df['id_label'] = df['col_1'].astype(str).str[0].map(d)
print (df)
   col_1    id_label
0    133       Maine
1    255  California
2     36           a
3    477           f
4     55           r
5     63           r

Answer 2

您可以使用apply，以防您有很多精选

def ifef(col):
    col = str(col)
    if col.startswith('1'):
        return  'name_1'
    if col.startswith('2'):
        return 'name_2'
    if col.startswith('3'):
        return 'name_3'
    if col.startswith('4'):
        return'name_4'
    if col.startswith('5'):
        return 'name_5'
    if col.startswith('6'):
        return 'name_5'
df = pd.DataFrame({'col_1':[133,255,36,477,55,63]})
df['id_label'] = df['col_1'].apply(ifef)

   col_1 id_label
0    133   name_1
1    255   name_2
2     36   name_3
3    477   name_4
4     55   name_5
5     63   name_5

如果您有词典，可以使用

df = pd.DataFrame({'col_1':[133,255,36,477,55,63]})
d = {'1':'M', '2': 'C', '3':'a', '4':'f', '5':'r', '6':'s'}
def ifef(col):
    col = str(col)
    return d[col[0]]

df['id_label'] = df['col_1'].apply(ifef)
print(df)

  col_1 id_label
0    133        M
1    255        C
2     36        a
3    477        f
4     55        r
5     63        s

Answer 3

您可以检查一下，让我知道它是否适合您的问题。

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

df = pd.DataFrame({'col_1':[133,255,36,477,55,63]})

df['col_2'] = df['col_1'].astype(str).str[0]

condlist = [df['col_2'] == "1",
            df['col_2'] == "2",
            df['col_2'] == "3",
            df['col_2'] == "4",
            ((df['col_2'] == "5") | (df['col_2'] == "6")),
            ]
choicelist = ['Maine','California','India', 'Frnace','5/6']

df['id_label'] = np.select(condlist, choicelist)

print(df)

#### Output ####

   col_1 col_2    id_label
0    133     1       Maine
1    255     2  California
2     36     3       India
3    477     4      Frnace
4     55     5         5/6
5     63     6         5/6

PS：感谢@ ALollz将我介绍给np.select

如何根据现有列pandas中的多个条件创建新列

3 个答案: