import pandas as pd
df = pd.DataFrame({'col1':[1,2,3,4,2,5,6,7,1,8,9,2], 'city':[1,2,3,4,2,5,6,7,1,8,9,2]})
# The following code, creates a boolean filter,
filter = df.city==2
# Assigns True to all rows where filter is True
df.loc[filter,'selected']= True
我需要的是对代码进行更改,以便将True分配给给定的n个行。
实际数据帧有超过300万行。有时候,我想要 df.loc [filter,'selected'] =仅适用于100行[实际行可以大于或小于100]。
答案 0 :(得分:1)
我认为您首先需要按列表中定义的值isin
进行过滤,然后对前2个值使用GroupBy.head
:
cities= [2,3]
df = df1[df1.city.isin(cities)].groupby('city').head(2)
print (df)
col1 city
1 2 2
2 3 3
4 2 2
如果需要在新列中分配True
:
cities= [2,3]
idx = df1[df1.city.isin(cities)].groupby('city').head(2).index
df1.loc[idx, 'selected'] = True
print (df1)
col1 city selected
0 1 1 NaN
1 2 2 True
2 3 3 True
3 4 4 NaN
4 2 2 True
5 5 5 NaN
6 6 6 NaN
7 7 7 NaN
8 1 1 NaN
9 8 8 NaN
10 9 9 NaN
11 2 2 NaN
答案 1 :(得分:1)
定义要检查的元素列表,并将其传递到city
列,并使用True
和False
布尔值创建一个新列..
>>> check
[2, 3]
>>> df['Citis'] = df.city.isin(check)
>>> df
col1 city Citis
0 1 1 False
1 2 2 True
2 3 3 True
3 4 4 False
4 2 2 True
5 5 5 False
6 6 6 False
7 7 7 False
8 1 1 False
9 8 8 False
10 9 9 False
11 2 2 True
OR
>>> df['Citis'] = df['city'].apply(lambda x: x in check)
>>> df
col1 city Citis
0 1 1 False
1 2 2 True
2 3 3 True
3 4 4 False
4 2 2 True
5 5 5 False
6 6 6 False
7 7 7 False
8 1 1 False
9 8 8 False
10 9 9 False
11 2 2 True
事实上,您确实需要从头开始(假设要读取5个值)
df['Citis'] = df.city.isin(check).head(5)
OR
df['Citis'] = df['city'].apply(lambda x: x in check).head(5)