如何在不知道姓氏的情况下从pandas dfs的特定列中删除前8个字符?

时间:2019-10-12 17:21:48

标签: python pandas dataframe

我有一个由以下对象创建的熊猫DataFrame

df = pandas.DataFrame({"imdbPage": emptyWebPageSet,
                       "title": emptySetTitle,
                       "genre1": lst1,
                       "genre2": lst2,
                       "genre3": lst3,
                       "genre4": lst4,
                       "info":infoSet,
                       "Runtime(mins)":movieTime,
                       "releaseData":releaseDateSet,
                       "imdbRating":ratingSet,
                       "numberOfVotes":votesList,
                       "numberOfEpisodes":noOfEpisodesSet,
                       "TotalRunTime(mins)":totalRunTimeSet
                       })
df = pandas.get_dummies(data=df, columns=['genre1', 'genre2', 'genre3', 'genre4'])

输出中的列标题如下:

output = ["imdbPage", "title", "info", "Runtime(mins)", "releaseData", "imdbRating", "numberOfVotes",
"numberOfEpisodes", """genre1_Action", "genre1_Adventure", "genre1_Animation",
"genre1_Biography", "genre1_Comedy".... etc]

我想做的是从输出中删除所有"genre1_""genre2_"部分,但是我显然不确切知道该列的名称或有多少列,只有它们以"genre1_""genre2_""genre3_""genre4_"开头。

2 个答案:

答案 0 :(得分:1)

使用str.replace

import pandas as pd

output = ["imdbPage", "title", "info", "Runtime(mins)", "releaseData", "imdbRating", "numberOfVotes",
          "numberOfEpisodes", "genre1_Action", "genre1_Adventure", "genre1_Animation", "genre1_Biography",
          "genre1_Comedy"]

print(pd.Series(data=output).str.replace('^genre\d+_', ''))

输出

0             imdbPage
1                title
2                 info
3        Runtime(mins)
4          releaseData
5           imdbRating
6        numberOfVotes
7     numberOfEpisodes
8               Action
9            Adventure
10           Animation
11           Biography
12              Comedy
dtype: object

答案 1 :(得分:0)

您可以尝试以下操作(参考Here):

newcols = {}
for col in df.columns:
    newcol = re.match("(^genre\d{1,}_)(.*$)", col).group(2)
    newcols[col] = newcol
df.rename(columns=newcols, inplace=True)
print(df)

或更简洁地说:

df.rename(columns=lambda x: re.match("(^genre\d{1,}-)(.*$)", x).group(2), inplace=True)