Question

我有

         Apple f2 m  Apple f2 t  Apple f3 m   Apple f3 t
0                 3           4           5            3
1                 12          7           4            7  
2                 5           9           7            5
3                 3           3           4            8
4                 7           1           2            6

我想选择str =＆＃39; Apple f * m＆＃39;对str =＆＃39; Apple f * t＆＃39;

的列进行t检验

我试过了

ttest_ind(df.loc[:,df.columns.str.contains('Apple R* m')], df.loc[:,df.columns.str.contains('Apple R* t')]

然而，它并没有认识到我的通配符有一个通配符。

谢谢你帮助我解决或指导我解决这个问题。

Answer 1

供将来参考。默认情况下，pandas.Series.str.contains将param正则表达式设置为True，这意味着我们可以使用正则表达式。

要查找0个或更多任何字符，我们可以简单地使用它（参见Alan Moore）

。*仅表示＆＃34; 0或更多任何字符＆＃34;

它分为两部分：

。 - a＆＃34; dot＆＃34;表示任何字符   * - 表示＆＃34;前面的正则表达式标记的0个或多个实例＆＃34;

这是一个指向regex101的链接，您可以在其中测试正则表达式：

https://regex101.com/r/QNjkch/1

最后我们可以简化您的代码，考虑这个简单的例子：

import pandas as pd
df = pd.DataFrame(columns=["a1a","a2a","a1b"])

mask = df.columns.str.contains('a.*a')

df.loc[:,mask] # selects mask
df.loc[:,~mask] # selects inverted (by using ~) mask

Answer 2

作为对Anton vBR的回答的补充，这就是您使用SELECT employees.id, first_name, last_name, email, event_type, CASE WHEN DATEDIFF(DAY, employee_records.created_at, SYSDATETIME()) < 365 * 5 THEN 1 WHEN DATEDIFF(DAY, employee_records.created_at, SYSDATETIME()) < 365 * 10 THEN 2 ELSE 3 END AS benfits_type, DATEDIFF(DAY, employee_records.created_at, SYSDATETIME()) AS days_employed, employee_records.created_at AS hire_date FROM employees JOIN employee_records ON employees.id = employee_records.employee_id ORDER BY employees.id ASC;的方式。

str.contains

正则表达式详细信息

i = df.columns.str.contains(r'Apple\s+f\d+\s+m')
j = df.columns.str.contains(r'Apple\s+f\d+\s+t')

df.iloc[:, i]
   Apple f2 m  Apple f3 m
0           3           5
1          12           4
2           5           7
3           3           4
4           7           2

df.iloc[:, j]
   Apple f2 t  Apple f3 t
0           4           3
1           7           7
2           9           5
3           3           8
4           1           6

选择名称与str匹配的列，使用通配符进行t-test（Python）

2 个答案: