使用正则表达式对pandas系列进行拼写不起作用。我想基于分隔符拆分系列而不删除分隔符。
public async Task ScheduleNotices()
{
var schedules = await _dbContext.Schedules
.Include(x => x.User)
.ToListAsync().ConfigureAwait(false);
if (!schedules.HasAny())
{
return;
}
foreach (var schedule in schedules)
{
var today = DateTime.UtcNow.Date;
// schedule notification only if not already scheduled for today
if (schedule.LastScheduledDateTime == null || schedule.LastScheduledDateTime.Value.Date < today)
{
//construct scheduled datetime for today
var scheduleDate = new DateTime(today.Year, today.Month, today.Day, schedule.PreferredTime.Hours, schedule.PreferredTime.Minutes, schedule.PreferredTime.Seconds, DateTimeKind.Unspecified);
// convert scheduled datetime to UTC
schedule.LastScheduledDateTime = TimeZoneInfo.ConvertTimeToUtc(scheduleDate, TimeZoneInfo.FindSystemTimeZoneById(schedule.User.TimeZone));
//*** i think we dont have to convert to DateTimeOffSet since LastScheduledDateTime is already in UTC
var dateTimeOffSet = new DateTimeOffset(schedule.LastScheduledDateTime.Value);
BackgroundJob.Schedule<INotificationService>(x => x.Notify(schedule.CompanyUserID), dateTimeOffSet);
}
}
await _dbContext.SaveChangesAsync();
}
结果是:
df2= pd.Series(['Series of Class A','Series of Class B part of Class C','Class D','Class'])
seperator='Class'
data = df2.str.split(r'.(?='+seperator+')', n = 2, expand=True)
我想使用rsplit做同样的事情
我尝试了
0 1 2
0 Series of Class A None
1 Series of Class B part of Class C
2 Class D None None
3 Class None None
使用rsplit预期相同的结果
data = df2.str.rsplit(r'.(?='+seperator+')', n = 2, expand=True)
答案 0 :(得分:1)
不幸的是,pd.Series.str.rsplit
不能按文档所述工作(v0.25,stable/v1+)。该项目的GitHub问题跟踪器自2019年11月起有一个open bug,声称rsplit
不能使用正则表达式模式(v 4.24.2和0.25.2)。在内部,该方法正在调用不支持正则表达式的str.rsplit
。
幸运的是,记者贾姆斯佩德(Jamespreed)添加了(自产)替代品function:
def str_rsplit(arr, pat=None, n=None): if pat is None or len(pat) == 1: if n is None or n == 0: n = -1 f = lambda x: x.rsplit(pat, n) else: if n is None or n == -1: n = 0 regex = re.compile(pat) def f(x): s = regex.split(x) a, b = s[:-n], s[-n:] if not a: return b ix = 0 for a_ in a: ix = x.find(a_, ix) + len(a_) x_ = [x[:ix]] return x_ + b return f res = _na_map(f, arr) return res