大熊猫改变原始数据的类别

时间:2018-01-18 05:45:48

标签: python pandas

如何将原始数据转换为更容易计算的内容

    info.teams

    ['Australia', 'Sri Lanka']
    ['Australia', 'Sri Lanka']
    ['Australia', 'Sri Lanka']
    ['India', 'West Indies']
    ['India', 'West Indies']
    ['Bangladesh', 'West Indies']
    ['Australia', 'Sri Lanka']
    ['Bangladesh', 'India']
    ['Australia', 'Sri Lanka']
    ['India', 'West Indies']
    ['India', 'South Africa']
    ['Afghanistan', 'India']
    ['India', 'South Africa']
    ['Australia', 'Sri Lanka']
    ['India', 'Sri Lanka']
    ['India', 'Sri Lanka']
    ['India', 'Sri Lanka']
    ['Australia', 'Sri Lanka']
    ['Bangladesh', 'West Indies']
    ['Afghanistan', 'India']
    ['India', 'South Africa']
    ['Australia', 'Sri Lanka']
    ['Australia', 'Sri Lanka']
    ['Bangladesh', 'West Indies']
    ['India', 'West Indies']
    ['Bangladesh', 'West Indies']
    ['Bangladesh', 'India']
    ['India', 'South Africa']

这是该列的数据类型。

info.teams                 1547 non-null object

假设我想找出一起参加比赛的球队。 ['India','Australia']我必须编写如下代码:

#choosing particular teams 
team_1='India' 
team_2='Australia' 
team_12='['+"'"+team_1+"'"+', '+"'"+team_2+"'"+']' 
team_21='['+"'"+team_2+"'"+', '+"'"+team_1+"'"+']' 
df=df[(df['info.teams']==team_12) | (df['info.teams']==team_21)] 

2 个答案:

答案 0 :(得分:2)

如果数据采用字符串形式,则使用ast.literal_eval将它们转换为列表,应用pd.Series然后使用isin选择列,即

import ast 
df['teams'] = df['teams'].str.strip().apply(ast.literal_eval)

ndf = df['teams'].apply(pd.Series)
ndf[ndf.isin(['India','Sri Lanka']).all(1)]

        0          1
14  India  Sri Lanka
15  India  Sri Lanka
16  India  Sri Lanka

如果要从主数据帧中选择数据,请使用ndf中的索引,即

idx = ndf[ndf.isin(['India','Sri Lanka']).all(1)].index

df.loc[idx]

            teams
14  [India, Sri Lanka]
15  [India, Sri Lanka]
16  [India, Sri Lanka]

答案 1 :(得分:-1)

不确定你在找什么。但是如果你想让团队分成两个单独的栏目,那么你可能想要做这样的事情:

   info[['team1','team2']]=pd.DataFrame(info.teams.values.tolist())

out put:

    teams     team1 team2
  0 [aus,ind]   aus    ind