因此,我创建了两个由100个元素组成的系列,并将它们“或”在一起。 但是首先,我对第一个系列进行了“排序”,这意味着索引不会对齐。 我预期会发生错误。还是不好的结果。但是我得到的是带有126个元素的第三系列!那真是令人惊讶。有什么想法吗?
请注意billy_or_peter输出清单中的“ Richardson”的4行。有4个值,两个为“ True”,两个为“ False”。
我认为可能会有某种“笛卡尔积”导致200行。但是,我看到了126行-这很奇怪。
有想法吗?
# Loc and Iloc also allow for conditional statments to filter rows of data
# using Loc on the logic test above only returns rows where the result is True
only_billys = df.loc[df["first_name"] == "Billy", :]
print(only_billys)
only_peters = df.loc[df["first_name"] == "Peter", :]
print(only_peters)
print()
only_richardsons = df.loc["Richardson", :]
print(only_richardsons)
print()
isBilly = (df["first_name"] == "Billy").sort_index()
print(isBilly.describe())
print()
isPeter = (df["first_name"] == "Peter")
print(isPeter.describe())
print()
billy_or_peter = isPeter | isBilly
print(billy_or_peter.describe())
print(billy_or_peter)
输出
(only_billys)
id first_name Phone Number Time zone
last_name
Clark 20 Billy 62-(213)345-2549 Asia/Makassar
Andrews 23 Billy 86-(859)746-5367 Asia/Chongqing
Price 59 Billy 86-(878)547-7739 Asia/Shanghai
id first_name Phone Number Time zone
(only_peters)
last_name
Richardson 1 Peter 7-(789)867-9023 Europe/Moscow
id first_name Phone Number Time zone
(only_richardsons)
last_name
Richardson 1 Peter 7-(789)867-9023 Europe/Moscow
Richardson 25 Donald 62-(259)282-5871 Asia/Jakarta
(isBilly.describe() - sorted index)
count 100
unique 2
top False
freq 97
Name: first_name, dtype: object
(isPeter.describe())
count 100
unique 2
top False
freq 99
Name: first_name, dtype: object
(billy_or_peter.describe() - 126 rows???)
count 126
unique 2
top False
freq 121
Name: first_name, dtype: object
(billy_or_peter listing - notice 4 Richardsons where before there were only 2)
last_name
Adams False
Allen False
Andrews True
Austin False
Baker False
Banks False
Bell False
Berry False
Bishop False
Black False
Brooks False
Brown False
Bryant False
Bryant False
Bryant False
Bryant False
Burke False
Butler False
Butler False
Butler False
Butler False
Carroll False
Chapman False
Chavez False
Clark True
Collins False
Cook False
Day False
Day False
Day False
...
Price True
Reid False
Reyes False
Rice False
*Richardson True
*Richardson True
*Richardson False
*Richardson False
Riley False
Roberts False
Robertson False
Robinson False
Rogers False
Scott False
Shaw False
Shaw False
Shaw False
Shaw False
Simmons False
Snyder False
Sullivan False
Torres False
Tucker False
Vasquez False
Wagner False
Walker False
Washington False
Watkins False
Wells False
Williamson False
Name: first_name, Length: 126, dtype: bool
答案 0 :(得分:1)
不匹配不是问题所在,pandas
将在|
之前对齐。您的问题是由于索引重复。为此,比较是通过在匹配索引中进行outer
连接来完成的。因此,一个中的2个Richardsons和另一个中的2个Richardsons将导致您的输出中有4行。
为更清楚地说明这一点,请看一下添加具有重复和未对齐索引的字符串时发生的情况。我们从笛卡尔乘积中获得了索引1的6(2 x 3)行:
import pandas as pd
df1 = pd.DataFrame(list('abcd'), index=[1,1,2,3])
df2 = pd.DataFrame(list('1243'), index=[1,1,3,1])
df1+df2
0
1 a1
1 a2
1 a3
1 b1
1 b2
1 b3
2 NaN
3 d4