识别熊猫数据框中并发事件的简单方法

时间:2019-06-30 06:47:48

标签: python pandas concurrency

我正在寻找一种向数据框添加一列的简单方法,以指示给定部件是否已连续购买至少两年

这是示例数据框

df['std'] = df.groupby(['PART_UNIT'])['PO_UNIT_PRICE'].transform(np.std)

我正在寻找一种与添加标准偏差列时将使用的功能类似的功能

  PART_UNIT FiscalYear  PO_UNIT_PRICE  Concurrent
0         A  2015/2016             10           1
1         A  2016/2017             12           1
2         A  2018/2019             11           1
3         B  2015/2016             45           0
4         B  2017/2018             54           0

获得这样的结果

public static void Main(string[] args)
    {
        var conf = new ConsumerConfig
        {
            GroupId = "test-consumer-group",
            BootstrapServers = "127.0.0.1:9092",
            // Note: The AutoOffsetReset property determines the start offset in the event
            // there are not yet any committed offsets for the consumer group for the
            // topic/partitions of interest. By default, offsets are committed
            // automatically, so in this example, consumption will only start from the
            // earliest message in the topic 'my-topic' the first time you run the program.
            AutoOffsetReset = AutoOffsetReset.Earliest
        };

        using (var c = new ConsumerBuilder<Ignore, string>(conf).Build())
        {
            c.Subscribe("testtopic");

            CancellationTokenSource cts = new CancellationTokenSource();
            Console.CancelKeyPress += (_, e) => {
                e.Cancel = true; // prevent the process from terminating.
                cts.Cancel();
            };

            try
            {
                while (true)
                {
                    try
                    {
                        var cr = c.Consume(cts.Token);  // I NEED TRANSACTION HERE...


                        Console.WriteLine($"Consumed message '{cr.Value}' at: '{cr.TopicPartitionOffset}'.");
                    }
                    catch (ConsumeException e)
                    {
                        Console.WriteLine($"Error occured: {e.Error.Reason}");
                    }
                }
            }
            catch (OperationCanceledException)
            {
                c.Close();
            }
        }
    }

如您所见,“ B”部分的列为0,因为它已经连续两年没有购买。

1 个答案:

答案 0 :(得分:1)

import pandas as pd

df = pd.DataFrame(
    {
        'PART_UNIT': ['A', 'A', 'A', 'B', 'B'],
        'FiscalYear': ['2015/2016', '2016/2017', '2018/2019', '2015/2016', '2017/2018'],
        'PO_UNIT_PRICE': [10, 12, 11, 45, 54]
    }
)

print(df)


def two_years_in_a_row(fiscal_years):
    tmp = list(fiscal_years)
    for idx, year in enumerate(tmp):
        if idx > 0:
            if tmp[idx - 1].split('/')[1] == year.split('/')[0]:
                return 1
    return 0


print('----------------------------------------')

df['concurrent'] = df.groupby(['PART_UNIT'])['FiscalYear'].transform(two_years_in_a_row)

print(df)

输出

 PART_UNIT FiscalYear  PO_UNIT_PRICE
0         A  2015/2016             10
1         A  2016/2017             12
2         A  2018/2019             11
3         B  2015/2016             45
4         B  2017/2018             54
----------------------------------------
  PART_UNIT FiscalYear  PO_UNIT_PRICE  concurrent
0         A  2015/2016             10           1
1         A  2016/2017             12           1
2         A  2018/2019             11           1
3         B  2015/2016             45           0
4         B  2017/2018             54           0