Question

我有我的数据 -

data = [['abc - a', 'A'], ['def - b', 'B'], ['ghi - c', 'C'], ['jkl - d', 'D']]
df = pd.DataFrame(data, columns = ['names', 'category'])
df
names   category
abc - a   A
def - b   B
ghi - c   C
jkl - d   D

我想要的输出是 -

names     division    category
    abc      a          A
    def      b          B
    ghi      c          C
    jkl      d          D

有很多方法可以执行此操作，但我想使用此逻辑执行此操作 -

遍历列名的每一行，并将每个值存储在 'st1' 中，然后 ->

first, middle, last = st1.partition(' - ')
df['names'] = first
df['division'] = last

并将其一一分配给数据框，请帮助我在python中获得所需的输出。

Answer 1

你可以这样做：

df[['names','division']] = df.names.str.split(' - ',expand=True)

Answer 2

像以前一样创建数据帧，然后遍历名称和类别的所有行，并通过 - 拆分名称并将它们附加到新数据集，然后将其转换为另一个数据帧，如下所示：

import pandas as pd

data = [['abc - a', 'A'], ['def - b', 'B'], ['ghi - c', 'C'], ['jkl - d', 'D']]
df = pd.DataFrame(data, columns = ['names', 'category'])

newdata = []
for names, category in zip(df.names, df.category):
    name, division = names.split("-")
    newdata.append([name.strip(), division.strip(), category])

new_df = pd.DataFrame(newdata, columns = ['names', 'division', 'category'])

print对新数据框进行处理会导致：

>>> new_df
  names division category
0   abc        a        A
1   def        b        B
2   ghi        c        C
3   jkl        d        D

Answer 3

我正在测试 github copilot，看看它如何解决 stackoverflow 问题。

# Solution 1
import pandas as pd
import numpy as np

data = [['abc - a', 'A'], ['def - b', 'B'], ['ghi - c', 'C'], ['jkl - d', 'D']]
df = pd.DataFrame(data, columns=['names', 'category'])


# Iterate through each rows of column-names, and store each value in 'st1' and then ->
# first, middle, last = st1.partition(' - ')
# df['names'] = first
# df['division'] = last
# and also assigning it to dataframe one by one, please help me to get my desired output in python.


for index, row in df.iterrows():
    st1 = row['names']
    first, middle, last = st1.partition(' - ')
    df.loc[index, 'names'] = first
    df.loc[index, 'division'] = last

# Explain what is df.loc
# df.loc[row index, column index]
# df.loc[0, 'names'] = first
# df.loc[0, 'division'] = last

print(df)

输出：

  names category division
0   abc        A        a
1   def        B        b
2   ghi        C        c
3   jkl        D        d

Answer 4

由于您想遍历 DataFrame 中的每一行并单独处理它们，因此您需要使用一些 NumPy 来完成您的工作。由于您要拆分行，因此 .partition() 在 Pandas 中的工作方式与 .split() 类似，但在 NumPy 中则不同。

以下是您需要的软件包：

import pandas as pd
import numpy as np

在遍历行之前，您需要使用 .insert() 创建一个名为“division”的新列（我使用 np.nan 作为填充符，但您可以使用任何值你想要：

df.insert(1, 'division', np.nan)

现在您可以使用 Pandas 的 iterrows() 方法遍历行。

# index returns the index number, row returns a tuple of the row values
for index, row in df.iterrows():
    
    # convert row values from a tuple to a row
    row = list(row)
    
    # remove 'np.nan' value from the column we created above
    row.pop(1)
    
    # split value from the 'names' column; creates values for 'names' and 'division' columns
    new_row = row[0].split(' - ')
    
    # append the value from the 'category' column
    new_row = np.append(new_row, row[1])
    
    # save the new row to the DataFrame
    df.iloc[index] = new_row

这是输出：

|    | names   | division   | category   |
|---:|:--------|:-----------|:-----------|
|  0 | abc     | a          | A          |
|  1 | def     | b          | B          |
|  2 | ghi     | c          | C          |
|  3 | jkl     | d          | D          |

遍历列的每一行并执行操作

4 个答案: