Question

作为python和pandas的新手，我试过：

df_rows = np.where('y' in df['x'])[0]
for i in df_rows:
    print df_rows.iloc[i]

没有返回任何行，但是

df_rows = np.where(df['x'].str.contains('y'))[0]
for i in df_rows:
    print df_rows.iloc[i]

确实有效并在'y'中返回了包含df['x']的行。

我错过了什么？为什么第一种形式失败了？（Python 2.7）

Answer 1

这些是不同的操作：

Sum Frequency Relative Frequency Experimental Probability ---------------------------------------------------------------------- 3 45 0.00450 0.45 % 4 126 0.01260 1.26 % 5 281 0.02810 2.81 % 6 494 0.04940 4.94 % 7 677 0.06770 6.77 % 8 968 0.09680 9.68 % 9 1191 0.11910 11.91 % 10 1257 0.12570 12.57 % 11 1257 0.12570 12.57 % 12 1164 0.11640 11.64 % 13 932 0.09320 9.32 % 14 683 0.06830 6.83 % 15 469 0.04690 4.69 % 16 282 0.02820 2.82 % 17 122 0.01220 1.22 % 18 52 0.00520 0.52 %检查是否搜索任何元素是否等于in。（注意：'y' Series甚至可能无法正常工作。
str方法搜索每个元素的字符串表示形式（如果它包含.str.contains。

第一个只能返回'y'或True（这是因为Pythons data model says so并强制执行）。第二个是常规方法，并返回包含False或Series的{{1}}（因为普通方法可以做他们喜欢的事情）。

True

Answer 2

Pandas需要特定的语法来处理事情。使用运算符in查找str y，检查大熊猫y中字符串Series的成员资格。

>>> df = pd.DataFrame({'x': ['hiya', 'howdy', 'hello']})
>>> df
       x
0   hiya
1  howdy
2  hello
>>> df_rows = np.where('y' in df['x'])[0]
>>> df_rows
array([], dtype=int64)
>>> df_rows = np.where(df['x'].str.contains('y'))[0]
>>> df_rows
array([0, 1], dtype=int64)

试试这个并注意它返回一个bool而不是三个（就像我们可能首先想到的那样，因为系列中有三个项目）：

>>> 'y' in df['x']
False
>>> 'hiya' in df['x']
False
>>> 'hiya' in df['x'].values
True

你总是需要自己思考：“我是在寻找系列中的项目，还是在系列中的项目中寻找字符串？”

对于系列中的项目，请使用isin：

df['x'].isin(['hello'])

对于项目中的字符串，请使用.str.{whatever}（或.apply(lambda s: s)）：

>>> df['x'].str.contains('y')
0     True
1     True
2    False
Name: x, dtype: bool
>>> df['x'].apply(lambda s: 'y' in s)
0     True
1     True
2    False
Name: x, dtype: bool

pandas / numpy np.where（df [＆＃39; x＆＃39;]。str.contains（＆＃39; y＆＃39;））vs np.where（＆＃39; y＆＃39; in df [＆＃39; X＆＃39;]）

2 个答案:

pandas / numpy np.where（df [＆＃39; x＆＃39;]。str.contains（＆＃39; y＆＃39;））vs np.where（＆＃39; y＆＃39; in df [＆ ＃39; X＆＃39;]）

2 个答案:

pandas / numpy np.where（df [＆＃39; x＆＃39;]。str.contains（＆＃39; y＆＃39;））vs np.where（＆＃39; y＆＃39; in df [＆＃39; X＆＃39;]）