Question

我有一个包含项目列表的列的数据框。我想创建 3 列，其中包含列表中的每个项目。这是数据框：如果列表有 2 个元素，我需要第一个元素在 age 列中，而其余元素在 weight 列中，并且有只是一个元素，我希望它在 age 列中。

| First_Name | Others               |
| ---------- | -------------------- |
| Ken        | ["Batman","28","5.5" |
| Cole       | ["36","6.1"          |
| Eddie      | ["24"                |

我想得到这个输出：

| First_Name | Others               | Surname | Age | Weight |
| ---------- | -------------------- |-------- | --- | ------ |
| Ken        | ["Batman","28","55"  | Batman  | 28  | 55     |
| Cole       | ["36","60"           |  NaN    | 36  | 60     |
| Eddie      | ["81"                |  NaN    | 81  | NaN    |

我尝试了 this 方法，但它没有给我想要的结果。我该怎么做？

Answer 1

使用您显示的示例，请尝试以下操作。在这里使用 Pandas str.extract 函数。

resource "azurerm_virtual_network" "hri-prd-VNET" {
  address_space       = ["10.1.0.0/16"]
  location            = var.location
  name                = "hri-prd-VNET"
  resource_group_name = azurerm_resource_group.rg-hri-prd-eur-app-gate.name
}

resource "azurerm_subnet" "hri-prd-app-gate" {
  name                 = "hri-prd-app-gateway-subnet"
  resource_group_name  = azurerm_resource_group.rg-hri-prd-eur-app-gate.name
  virtual_network_name = azurerm_virtual_network.hri-prd-VNET.name
  address_prefixes     = ["10.1.0.0/24"]
}

resource "azurerm_public_ip" "hri-prd-gate-pip" {
  allocation_method   = "Dynamic"
  location            = var.location
  name                = "hri-prd-gate-pip"
  resource_group_name = azurerm_resource_group.rg-hri-prd-eur-app-gate.name
}

输出如下：

df[["Surname","Age","Weight"]] = df['Others'].str.extract(r'\["(?:([^"]*)",")?(\d+)(?:","(\d+))?"',expand=False)

说明： 为上面使用的正则表达式添加详细说明。

  First_Name               Others Surname Age Weight
0        Ken  ["Batman","28","55"  Batman  28     55
1       Cole           ["36","60"      36  60    NaN
2      Eddie                ["81"     NaN  81    NaN

Answer 2

没有办法猜测大小不等的列表的列名。

在其他情况下，您可以使用 Series.apply

new_columns = data["Others"].apply(pd.Series, index=["Surname", "Age", "Weight"])
# insert new_columns using pd.concat
pd.concat([data, new_columns], axis=1)

您还可以使用更智能的应用功能，如下所示：

def func(others):
    index = ["Surname", "Age", "Weight"]
    if others[0].isnumeric():
        # the first item in the array is numeric, so it probably 'Age'
        return pd.Series([""] + others, index)
    return pd.Series(others, index)

new_columns = data["Others"].apply(func)

这个想法是从 apply 方法返回 pd.Series。

Answer 3

import pandas as pd

尝试使用 lstrip() 、replace()、split()、fillna()、apply() 和 to_numeric() 方法：

df[['Surname','Age','Weight']]=df['Others'].str.lstrip('[').str.split(',',expand=True).replace('"','',regex=True)

df['Weight']=df['Weight'].fillna(df['Age'])
df['Age']=pd.to_numeric(df['Surname'],errors='coerce').fillna(df['Age'])
df['Surname']=df['Surname'].apply(lambda x:x if x.isalpha() else float("NaN"))

现在如果你打印 df 你会得到你想要的输出

如何从包含数据框中列表的列创建新列

3 个答案: