您好我有这种形式的数据框:
Episode Number Rating Series
4 Days Out 2.9 9.1 "Breaking Bad" (2008)
Buyout 5.6 9.0 "Breaking Bad" (2008)
Pilot 1.1 9.0 "Breaking Bad" (2008)
Dog Fight 1.12 9.0 "Suits" (2011)
We're Done 4.7 9.0 "Suits" (2011)
Privilege 5.6 8.9 "Suits" (2011)
Pilot 1.1 8.9 "Suits" (2011)
我想为此数据框创建一个名为watched的新列,我将在列表中提供剧集编号(来自'Number'
列)并在其中应用where where方法,以便观看的列将有或没有价值观。
watchlist=[1.1, 4.7, 2.9]
df['watched'] = np.where(df['Number'].isin(watchlist), 'no', 'yes')
所以这会创建一个新的列,其中第4.7,2.9和1.1集的行中存在'无'值,但问题是我希望仅在其中一个中使用'否',而不是两者。是否有办法以某种方式区分列号中值为“1.1”的那两行? (它们在'Series'
列中具有不同的值,但在'Episode'
列中具有相同的值。
答案 0 :(得分:1)
对于单个关注列表
您可以将选定的isin
与np.where
一起使用,方法是选择要检查的系列,并为每个系列使用不同的监视列表。对于您的数据框df:
Episode Number Rating Series
0 4 Days Out 2.90 9.1 Breaking Bad (2008)
1 Buyout 5.60 9.0 Breaking Bad (2008)
2 Pilot 1.10 9.0 Breaking Bad (2008)
3 Dog Fight 1.12 9.0 Suits (2011)
4 We're Done 4.70 9.0 Suits (2011)
5 Privilege 5.60 8.9 Suits (2011)
6 Pilot 1.10 8.9 Suits (2011)
和watchlist
:
[1.1, 4.7, 2.9]
假设关注列表仅适用于Breaking Bad。使用np.where
仅将函数应用于与Breaking Bad (2008)
匹配的行,然后使用isin
查看Rating
列中的值是否在watchlist
中<: / p>
df['Breaking Bad Watched'] = df['Number'][np.where(df['Series'] == "Breaking Bad (2008)")[0]].isin(watchlist)
给出:
Episode Number Rating Series Breaking Bad Watched
0 4 Days Out 2.90 9.1 Breaking Bad (2008) True
1 Buyout 5.60 9.0 Breaking Bad (2008) False
2 Pilot 1.10 9.0 Breaking Bad (2008) True
3 Dog Fight 1.12 9.0 Suits (2011) NaN
4 We're Done 4.70 9.0 Suits (2011) NaN
5 Privilege 5.60 8.9 Suits (2011) NaN
6 Pilot 1.10 8.9 Suits (2011) NaN
然后使用map
将true
/ false
转换为yes
/ no
:
d = {True: 'Yes', False: 'No'}
df['Breaking Bad Watched'] = df['Breaking Bad Watched'].map(d)
Episode Number Rating Series Breaking Bad Watched
0 4 Days Out 2.90 9.1 Breaking Bad (2008) Yes
1 Buyout 5.60 9.0 Breaking Bad (2008) No
2 Pilot 1.10 9.0 Breaking Bad (2008) Yes
3 Dog Fight 1.12 9.0 Suits (2011) NaN
4 We're Done 4.70 9.0 Suits (2011) NaN
5 Privilege 5.60 8.9 Suits (2011) NaN
6 Pilot 1.10 8.9 Suits (2011) NaN
------------------------对于一个关注的词典--------------- ----- 强>
如果您有一个列表,其中系列和剧集编号是单独指定的:
watchlist = {'Breaking Bad (2008)': [1.1, 4.7, 2.9], 'Suits (2011)': [4.7, 5.6]}
您可以按如下方式进行交流:
# Save name of new columns into new_col_list
new_col_list = []
for series, wlist in watchlist.iteritems():
# Save names of new columns into new_col_list
new_col_list.append('{} Watched'.format(series))
# Do calculation
print series, wlist
df['{} Watched'.format(series)] = df['Number'][np.where(df['Series'] == series)[0]].isin(wlist)
这会给你:
Episode Number Rating Series \
0 4 Days Out 2.90 9.1 Breaking Bad (2008)
1 Buyout 5.60 9.0 Breaking Bad (2008)
2 Pilot 1.10 9.0 Breaking Bad (2008)
3 Dog Fight 1.12 9.0 Suits (2011)
4 We're Done 4.70 9.0 Suits (2011)
5 Privilege 5.60 8.9 Suits (2011)
6 Pilot 1.10 8.9 Suits (2011)
Breaking Bad (2008) Watched Suits (2011) Watched
0 True NaN
1 False NaN
2 True NaN
3 NaN False
4 NaN True
5 NaN True
6 NaN False
new_col_list = ['Breaking Bad (2008) Watched', 'Suits (2011) Watched']
[1]如果只有几个名称,则手动编写它们:然后使用pd.concatenate
连接两个监视列,并删除这些列:
df['Watched'] = pd.concat([df['Breaking Bad (2008) Watched'].dropna(), df['Suits (2011) Watched'].dropna()])
# Remove old Columns
df.drop(['Breaking Bad (2008) Watched','Suits (2011) Watched'], axis=1, inplace=True)
[2]如果有一个列名列表,那么使用简单的列表推导将名称列表添加到pd.concat
,迭代new_col_list
中的列名:
df['Watched'] = pd.concat([df['{}'.format(i)].dropna() for i in new_col_list])
# Remove old Name Columns
df.drop(new_col_list, axis=1, inplace=True)
# Convert True False to Yes No
d = {True: 'Yes', False: 'No'}
df['Watched'] = df['Watched'].map(d)
# Final Output:
df:
Episode Number Rating Series Watched
0 4 Days Out 2.90 9.1 Breaking Bad (2008) Yes
1 Buyout 5.60 9.0 Breaking Bad (2008) No
2 Pilot 1.10 9.0 Breaking Bad (2008) Yes
3 Dog Fight 1.12 9.0 Suits (2011) No
4 We're Done 4.70 9.0 Suits (2011) Yes
5 Privilege 5.60 8.9 Suits (2011) Yes
6 Pilot 1.10 8.9 Suits (2011) No
<强>来源强>
isin
的来源:
[1] How to check if a value is in the list in selection from pandas data frame? http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.isin.html
concat
的来源:
map
的来源:
答案 1 :(得分:1)
实现这一目标的方法简单而有效(比当前答案快2.5倍)。对于您的数据框df
和关注列表watchlist
字典,您可以将df.loc
用于多个条件。
首先,创建占位符列:
df['Watched'] = 'No'
Episode Number Rating Series Watched
0 4 Days Out 2.90 9.1 Breaking Bad (2008) No
1 Buyout 5.60 9.0 Breaking Bad (2008) No
2 Pilot 1.10 9.0 Breaking Bad (2008) No
3 Dog Fight 1.12 9.0 Suits (2011) No
4 We're Done 4.70 9.0 Suits (2011) No
5 Privilege 5.60 8.9 Suits (2011) No
6 Pilot 1.10 8.9 Suits (2011) No
然后迭代监视列表:
for key, values in watchlist.iteritems():
df.loc[(df['Number'].isin(values)) & (df['Series'] == key), 'Watched'] = 'yes'
这会给df
:
Episode Number Rating Series Watched
0 4 Days Out 2.90 9.1 Breaking Bad (2008) yes
1 Buyout 5.60 9.0 Breaking Bad (2008) No
2 Pilot 1.10 9.0 Breaking Bad (2008) yes
3 Dog Fight 1.12 9.0 Suits (2011) No
4 We're Done 4.70 9.0 Suits (2011) yes
5 Privilege 5.60 8.9 Suits (2011) yes
6 Pilot 1.10 8.9 Suits (2011) No
无需额外的列/连接或删除列。
Total time this answer = 0.00800013542175 s
Total time accepted answer = 2.624944121596675 s