熊猫:过滤数据框,并将值分配给前n行

时间:2018-11-06 06:32:47

标签: python pandas

import pandas as pd

df = pd.DataFrame({'col1':[1,2,3,4,2,5,6,7,1,8,9,2], 'city':[1,2,3,4,2,5,6,7,1,8,9,2]})

# The following code, creates a boolean filter,

filter = df.city==2

# Assigns True to all rows where filter is True
df.loc[filter,'selected']= True

我需要的是对代码进行更改,以便将True分配给给定的n个行。

实际数据帧有超过300万行。有时候,我想要 df.loc [filter,'selected'] =仅适用于100行[实际行可以大于或小于100]。

2 个答案:

答案 0 :(得分:1)

我认为您首先需要按列表中定义的值isin进行过滤,然后对前2个值使用GroupBy.head

cities= [2,3]
df = df1[df1.city.isin(cities)].groupby('city').head(2)
print (df)
   col1  city
1     2     2
2     3     3
4     2     2

如果需要在新列中分配True

cities= [2,3]
idx = df1[df1.city.isin(cities)].groupby('city').head(2).index

df1.loc[idx, 'selected'] = True
print (df1)
    col1  city selected
0      1     1      NaN
1      2     2     True
2      3     3     True
3      4     4      NaN
4      2     2     True
5      5     5      NaN
6      6     6      NaN
7      7     7      NaN
8      1     1      NaN
9      8     8      NaN
10     9     9      NaN
11     2     2      NaN

答案 1 :(得分:1)

定义要检查的元素列表,并将其传递到city列,并使用TrueFalse布尔值创建一个新列..

>>> check  
[2, 3]
>>> df['Citis'] = df.city.isin(check)
>>> df
    col1  city  Citis
0      1     1  False
1      2     2   True
2      3     3   True
3      4     4  False
4      2     2   True
5      5     5  False
6      6     6  False
7      7     7  False
8      1     1  False
9      8     8  False
10     9     9  False
11     2     2   True

OR

>>> df['Citis'] = df['city'].apply(lambda x: x in check)
>>> df
    col1  city  Citis
0      1     1  False
1      2     2   True
2      3     3   True
3      4     4  False
4      2     2   True
5      5     5  False
6      6     6  False
7      7     7  False
8      1     1  False
9      8     8  False
10     9     9  False
11     2     2   True

事实上,您确实需要从头开始(假设要读取5个值)

df['Citis'] = df.city.isin(check).head(5)

OR 

df['Citis'] = df['city'].apply(lambda x: x in check).head(5)