Question

values = [5, 6,7,8 , 9, 11,12, 13, 14, 17, 18,19, 20, 21,22, 23, 
          24, 25, 26, 27, 41, 42, 44, 45, 46, 47]
s = pd.Series(values)
s1 = s.groupby(s.diff().gt(1).cumsum()).apply(lambda x: ','.join(x.astype(str)))
print (s1)

0：5,6,7,8,9

1：11,12,13,14

2：17,18,19,20,21,22,23,24,25,26,27

3：41,42

4：44,45,46,47

我正在尝试找到该组每行的min和max。我尝试了几种方法，但我没有正确理解。

我的信念是，它必须转换为int，然后可以找到最大值和最小值，但我不知道该怎么做。每次我尝试访问该系列时，它都会转换为字符串。

在以下min循环中，输出的格式为max和for：

for num in s1:
    min_value = 
    max_value = 
    print(min_value ,max_value )

Answer 1

我建议创建list，然后加入string，然后使用min和max：

s1 = s.groupby(s.diff().gt(1).cumsum()).apply(list)
print (s1)
0                                 [5, 6, 7, 8, 9]
1                                [11, 12, 13, 14]
2    [17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27]
3                                        [41, 42]
4                                [44, 45, 46, 47]
dtype: object

for num in s1:
    min_value = min(num)
    max_value = max(num)
    print(min_value ,max_value)

或者更好地使用groupby个对象，然后首先加入string，然后汇总min和max：

g = s.groupby(s.diff().gt(1).cumsum())
s1 = g.apply(lambda x: ','.join(x.astype(str)))
print (s1)
0                           5,6,7,8,9
1                         11,12,13,14
2    17,18,19,20,21,22,23,24,25,26,27
3                               41,42
4                         44,45,46,47
dtype: object

s1 = g.agg([min, max])
print (s1)
   min  max
0    5    9
1   11   14
2   17   27
3   41   42
4   44   47

但是，如果需要使用joined字符串，则可以拆分并转换为int，最后获取min和max：

s1 = s.groupby(s.diff().gt(1).cumsum()).apply(lambda x: ','.join(x.astype(str)))
print (s1)
0                           5,6,7,8,9
1                         11,12,13,14
2    17,18,19,20,21,22,23,24,25,26,27
3                               41,42
4                         44,45,46,47
dtype: object

for line in s1:
    a = [int(x) for x in line.split(',')]
    min_value = min(a)
    max_value = max(a)
    print(min_value ,max_value)

Answer 2

一个建议：

with Pool(initializer=_initializer, initargs=(config, cfg)) as p:
        logger.info('Getting conversion queue')
        cursor.execute(const.SQL_S_FILES,{})
        file_ids = cursor.fetchall()
        if file_ids:
            logger.info('Queueing %d files', len(file_ids))
            p.starmap(process, file_ids, chunksize=1)
        else:
            logger.info('Empty queue, rechecking in %ds', SLEEP)

Answer 3

获得s1后

s2=s1.str.split(',',expand=True).apply(pd.to_numeric)
s2.max(1)
Out[29]: 
0     9.0
1    14.0
2    27.0
3    42.0
4    47.0
dtype: float64
s2.min(1)
Out[30]: 
0     5.0
1    11.0
2    17.0
3    41.0
4    44.0
dtype: float64

如果你喜欢int，你可以在最后添加astype(int)

Answer 4

您可以使用apply功能

执行的操作

min_max = s1.apply(lambda x: (min(map(int, x.split(','))), 
                              max(map(int, x.split(',')))))

for min_, max_ in min_max:
  print (min_, max_)

执行时间：

In [10]: timeit s1.apply(lambda x: (min(map(int, x.split(','))), max(map(int, x.split(',')))))
109 µs ± 445 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

获取一组字符串中的最大值和最小值

4 个答案: