Question

我有一个数据框，其中索引是名称。但是名字是名字，姓*

数据看起来像这样

Index          Sales
Jones, Mike*   500
James, Amy     300

目标是拥有（或将索引更改为名称）

Index         Sales    Special 
Mike Jones     500       1
Amy James      300       0

如果存在*，那么将创建一个新列，如果存在*，则该列为1，否则为0？

Answer 1

假设Index作为索引列：

In [32]: df['Special'] = df.index.str.endswith('*').astype(int)                                                 

In [33]: df.set_index(df.index.str.replace(r'^(\w+),\s+(\w+)\*?', '\\2 \\1', regex=True))                       
Out[33]: 
            Sales  Special
Index                     
Mike Jones    500        1
Amy James     300        0

详细信息：

df.index.str.endswith('*').astype(int)-检查index列值是否以*结尾并将逻辑结果转换为整数值（是0或1）
df.index.str.replace(r'^(\w+),\s+(\w+)\*?', '\\2 \\1', regex=True)-用第一和第二个正则表达式捕获组index的内容替换(\w+)列值，将其位置替换为\\2 \\1（第二个后跟第一个）

Answer 2

# swap the first name and last name by splitting on the comma then using the .str attribute and reversing the list
print(df.index.str.split(',').str[::-1])

Index([[' Mike*', 'Jones'], [' Amy', 'James']], dtype='object')

# convert to series and .join the values in each row, then set as the index
df.set_index(pd.Series(df.index.str.split(',').str[::-1]).apply(lambda x : ' '.join(x)), inplace=True)
print(df)

              Sales
 Mike* Jones    500
 Amy James      300

# create a new column called "Special" and check where the index contains a "*"
# note you have to use "\*" because * is a special character
df['Special'] = df.index.str.contains('\*').astype(int)
print(df)

              Sales  Special
 Mike* Jones    500        1
 Amy James      300        0

# reassign the index after you replace the * with a blank ''
df.index = df.index.str.replace('\*', '')
print(df)

             Sales  Special
 Mike Jones    500        1
 Amy James     300        0

Answer 3

假设df是您的数据帧，而'Index'是索引。如果'Index'只是一列，请删除reset_index和set_index调用。

ddf = df.reset_index()
ddf['Special'] = ddf['Index'].str.contains('\*').astype(int)
ddf['Index'] = ddf['Index'].apply(lambda x : ' '.join(x.split(',')[::-1]).replace('*', '').strip())
ddf.set_index('Index', inplace=True)

ddf是结果：

            Sales  Special
Index                     
Mike Jones    500        1
Amy James     300        0

Answer 4

对此，我可以看到的快速解决方案是使用iterrows()。首先，将special列初始化为全零，df['special']=0。然后遍历各行以更正每个索引，并在需要的地方使特殊索引等于1。

类似这样的东西

for i,j in df.iterrows():
    if '*' in i:
            df.loc[i]['Special'] = 1
    df.rename(index={i: (i.split(',')[1] +' '+ i.split(',')[0]).replace('*','')}, inplace=True)

希望这会有所帮助。

如何用逗号交换名字和姓氏并添加新列？

4 个答案: