Question

我正在将pandas数据帧用于数据集，其中属性是英语单词。在词干之后，我有多个具有相同名称的列。以下是示例数据snap，在阻止后，accept, acceptable and accepted变为accept。我想在所有具有相同名称的列上使用bitwise_or并删除重复的列。我试过这段代码

import numpy
from nltk.stem import *
import pandas as pd
ps = PorterStemmer()
dataset = pd.read_csv('sampleData.csv')
stemmed_words = []

for w in list(dataset):
    stemmed_words.append(ps.stem(w))

dataset.columns = stemmed_words
new_word = stemmed_words[0]

for w in stemmed_words:
    if new_word == w:
         numpy.bitwise_or(dataset[new_word], dataset[w])
         del dataset[w]
     else:
         new_word = w

print(dataset)

问题是for循环执行时

del dataset['accept']

它删除具有此名称的所有列。我事先并不知道有多少列具有相同的名称，并且此代码生成异常KeyError：'accept'

我想在所有三个accept列上应用bitwise_or，将其保存到名为“accept”的新列中，并将del列保存。

我希望这次不会被贬低

以下是示例数据：

able  abundance  academy  accept  accept  accept  access  accommodation  accompany Class
   0          0        0       0       0       1       1              0          0     C
   0          0        0       1       0       0       0              0          0     A
   0          0        0       0       1       0       0              0          0     H
   0          0        0       0       0       1       0              1          0     G
   0          0        0       1       0       0       0              0          0     G

输出应为

Class  able  abundance  academy  accept  access  accommodation  accompany
    C     0          0        0       1       1              0          0
    A     0          0        0       1       0              0          0
    H     0          0        0       1       0              0          0
    G     0          0        0       1       0              1          0
    G     0          0        0       1       0              0          0

如何删除重复名称但保留数据的列

0 个答案: