我坚持使用我的代码来获得给定范围内的所有返回匹配。我的数据样本是:
comment
0 [intj74, you're, whipping, people, is, a, grea...
1 [home, near, kcil2, meniaga, who, intj47, a, l...
2 [thematic, budget, kasi, smooth, sweep]
3 [budget, 2, intj69, most, people, think, of, e...
我想得到的结果是:(给定范围是intj1到intj75)
comment
0 [intj74]
1 [intj47]
2 [nan]
3 [intj69]
我的代码是:
df.comment = df.comment.apply(lambda x: [t for t in x if t=='intj74'])
df.ix[df.comment.apply(len) == 0, 'comment'] = [[np.nan]]
我不确定如何使用正则表达式查找t =='范围'的范围。或者其他任何想法吗?
提前致谢,
Pandas Python Newbie
答案 0 :(得分:1)
您可以将[t for t in x if t=='intj74']
替换为例如
[t for t in x if re.match('intj[0-9]+$', t)]
甚至
[t for t in x if re.match('intj[0-9]+$', t)] or [np.nan]
如果没有匹配也会处理这种情况(这样就不需要使用df.ix[df.comment.apply(len) == 0, 'comment'] = [[np.nan]]
明确地检查它)"技巧"这是一个空列表计算结果为False
,以便{1}}返回其右操作数。
答案 1 :(得分:0)
我也是pandas
的新手。您可能以不同方式初始化了DataFrame。无论如何,这就是我所拥有的:
import pandas as pd
data = {
'comment': [
"intj74, you're, whipping, people, is, a",
"home, near, kcil2, meniaga, who, intj47, a",
"thematic, budget, kasi, smooth, sweep",
"budget, 2, intj69, most, people, think, of"
]
}
print(df.comment.str.extract(r'(intj\d+)'))