
时间:2017-04-16 16:19:19

标签: python pandas dataframe


    Episode    Number Rating Series
    4 Days Out   2.9    9.1  "Breaking Bad" (2008)
    Buyout       5.6    9.0 "Breaking Bad" (2008)
    Pilot        1.1    9.0 "Breaking Bad" (2008)
    Dog Fight    1.12   9.0 "Suits" (2011)
    We're Done   4.7    9.0 "Suits" (2011)
    Privilege    5.6    8.9 "Suits" (2011)
    Pilot        1.1    8.9 "Suits" (2011)

我想为此数据框创建一个名为watched的新列,我将在列表中提供剧集编号(来自'Number'列)并在其中应用where where方法,以便观看的列将有或没有价值观。

watchlist=[1.1, 4.7, 2.9]
df['watched'] = np.where(df['Number'].isin(watchlist), 'no', 'yes')

所以这会创建一个新的列,其中第4.7,2.9和1.1集的行中存在'无'值,但问题是我希望仅在其中一个中使用'否',而不是两者。是否有办法以某种方式区分列号中值为“1.1”的那两行? (它们在'Series'列中具有不同的值,但在'Episode'列中具有相同的值。

2 个答案:

答案 0 :(得分:1)



      Episode  Number  Rating               Series
0  4 Days Out    2.90     9.1  Breaking Bad (2008)
1      Buyout    5.60     9.0  Breaking Bad (2008)
2       Pilot    1.10     9.0  Breaking Bad (2008)
3   Dog Fight    1.12     9.0         Suits (2011)
4  We're Done    4.70     9.0         Suits (2011)
5   Privilege    5.60     8.9         Suits (2011)
6       Pilot    1.10     8.9         Suits (2011)


[1.1, 4.7, 2.9]

假设关注列表仅适用于Breaking Bad。使用np.where仅将函数应用于与Breaking Bad (2008)匹配的行,然后使用isin查看Rating列中的值是否在watchlist中<: / p>

df['Breaking Bad Watched'] = df['Number'][np.where(df['Series'] == "Breaking Bad (2008)")[0]].isin(watchlist)


      Episode  Number  Rating               Series Breaking Bad Watched
0  4 Days Out    2.90     9.1  Breaking Bad (2008)                 True
1      Buyout    5.60     9.0  Breaking Bad (2008)                False
2       Pilot    1.10     9.0  Breaking Bad (2008)                 True
3   Dog Fight    1.12     9.0         Suits (2011)                  NaN
4  We're Done    4.70     9.0         Suits (2011)                  NaN
5   Privilege    5.60     8.9         Suits (2011)                  NaN
6       Pilot    1.10     8.9         Suits (2011)                  NaN

然后使用maptrue / false转换为yes / no

d = {True: 'Yes', False: 'No'}
df['Breaking Bad Watched'] = df['Breaking Bad Watched'].map(d)

      Episode  Number  Rating               Series Breaking Bad Watched
0  4 Days Out    2.90     9.1  Breaking Bad (2008)                  Yes
1      Buyout    5.60     9.0  Breaking Bad (2008)                   No
2       Pilot    1.10     9.0  Breaking Bad (2008)                  Yes
3   Dog Fight    1.12     9.0         Suits (2011)                  NaN
4  We're Done    4.70     9.0         Suits (2011)                  NaN
5   Privilege    5.60     8.9         Suits (2011)                  NaN
6       Pilot    1.10     8.9         Suits (2011)                  NaN

------------------------对于一个关注的词典--------------- -----


watchlist = {'Breaking Bad (2008)': [1.1, 4.7, 2.9], 'Suits (2011)': [4.7, 5.6]}


# Save name of new columns into new_col_list
new_col_list = []

for series, wlist in watchlist.iteritems():
    # Save names of new columns into new_col_list
    new_col_list.append('{} Watched'.format(series))
    # Do calculation
    print series, wlist
    df['{} Watched'.format(series)] = df['Number'][np.where(df['Series'] == series)[0]].isin(wlist)


      Episode  Number  Rating               Series  \
0  4 Days Out    2.90     9.1  Breaking Bad (2008)   
1      Buyout    5.60     9.0  Breaking Bad (2008)   
2       Pilot    1.10     9.0  Breaking Bad (2008)   
3   Dog Fight    1.12     9.0         Suits (2011)   
4  We're Done    4.70     9.0         Suits (2011)   
5   Privilege    5.60     8.9         Suits (2011)   
6       Pilot    1.10     8.9         Suits (2011)   

  Breaking Bad (2008) Watched Suits (2011) Watched  
0                        True                  NaN  
1                       False                  NaN  
2                        True                  NaN  
3                         NaN                False  
4                         NaN                 True  
5                         NaN                 True  
6                         NaN                False  

new_col_list = ['Breaking Bad (2008) Watched', 'Suits (2011) Watched']


df['Watched'] = pd.concat([df['Breaking Bad (2008) Watched'].dropna(), df['Suits (2011) Watched'].dropna()])
# Remove old Columns
df.drop(['Breaking Bad (2008) Watched','Suits (2011) Watched'], axis=1, inplace=True)


df['Watched'] = pd.concat([df['{}'.format(i)].dropna() for i in new_col_list])
# Remove old Name Columns
df.drop(new_col_list, axis=1, inplace=True)

# Convert True False to Yes No
d = {True: 'Yes', False: 'No'}
df['Watched'] = df['Watched'].map(d)
# Final Output:
      Episode  Number  Rating               Series Watched
0  4 Days Out    2.90     9.1  Breaking Bad (2008)     Yes
1      Buyout    5.60     9.0  Breaking Bad (2008)      No
2       Pilot    1.10     9.0  Breaking Bad (2008)     Yes
3   Dog Fight    1.12     9.0         Suits (2011)      No
4  We're Done    4.70     9.0         Suits (2011)     Yes
5   Privilege    5.60     8.9         Suits (2011)     Yes
6       Pilot    1.10     8.9         Suits (2011)      No




[1] How to check if a value is in the list in selection from pandas data frame?   http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.isin.html



[2] https://stackoverflow.com/a/10972557/2254228



[3] Convert Pandas series containing string to boolean

答案 1 :(得分:1)



df['Watched'] = 'No'

      Episode  Number  Rating               Series Watched
0  4 Days Out    2.90     9.1  Breaking Bad (2008)      No
1      Buyout    5.60     9.0  Breaking Bad (2008)      No
2       Pilot    1.10     9.0  Breaking Bad (2008)      No
3   Dog Fight    1.12     9.0         Suits (2011)      No
4  We're Done    4.70     9.0         Suits (2011)      No
5   Privilege    5.60     8.9         Suits (2011)      No
6       Pilot    1.10     8.9         Suits (2011)      No


for key, values in watchlist.iteritems():
    df.loc[(df['Number'].isin(values)) & (df['Series'] == key), 'Watched'] = 'yes'


      Episode  Number  Rating               Series Watched
0  4 Days Out    2.90     9.1  Breaking Bad (2008)     yes
1      Buyout    5.60     9.0  Breaking Bad (2008)      No
2       Pilot    1.10     9.0  Breaking Bad (2008)     yes
3   Dog Fight    1.12     9.0         Suits (2011)      No
4  We're Done    4.70     9.0         Suits (2011)     yes
5   Privilege    5.60     8.9         Suits (2011)     yes
6       Pilot    1.10     8.9         Suits (2011)      No


Total time this answer = 0.00800013542175 s
Total time accepted answer = 2.624944121596675 s