Question

我坚持使用我的代码来获得给定范围内的所有返回匹配。我的数据样本是：

        comment
0       [intj74, you're, whipping, people, is, a, grea...
1       [home, near, kcil2, meniaga, who, intj47, a, l...
2       [thematic, budget, kasi, smooth, sweep]
3       [budget, 2, intj69, most, people, think, of, e...

我想得到的结果是:(给定范围是intj1到intj75）

         comment
0        [intj74]   
1        [intj47]    
2        [nan]   
3        [intj69]

我的代码是：

df.comment = df.comment.apply(lambda x: [t for t in x if t=='intj74'])
df.ix[df.comment.apply(len) == 0, 'comment'] = [[np.nan]]

我不确定如何使用正则表达式查找t ==＆＃39;范围＆＃39;的范围。或者其他任何想法吗？

提前致谢，

Pandas Python Newbie

Answer 1

您可以将[t for t in x if t=='intj74']替换为例如

[t for t in x if re.match('intj[0-9]+$', t)]

甚至

[t for t in x if re.match('intj[0-9]+$', t)] or [np.nan]

如果没有匹配也会处理这种情况（这样就不需要使用df.ix[df.comment.apply(len) == 0, 'comment'] = [[np.nan]]明确地检查它）＆＃34;技巧＆＃34;这是一个空列表计算结果为False，以便{1}}返回其右操作数。

Answer 2

我也是pandas的新手。您可能以不同方式初始化了DataFrame。无论如何，这就是我所拥有的：

import pandas as pd

data = {
    'comment': [
        "intj74, you're, whipping, people, is, a",
        "home, near, kcil2, meniaga, who, intj47, a",
        "thematic, budget, kasi, smooth, sweep",
        "budget, 2, intj69, most, people, think, of"
    ]
}
print(df.comment.str.extract(r'(intj\d+)'))

如何使用正则表达式获取给定范围的匹配结果？

2 个答案: