PySpark对自身内部map()和reduce()函数的引用

时间:2018-07-14 22:20:41

标签: python apache-spark mapreduce pyspark

this问题中所述,我应该避免在map函数内部调用self。扩展这一点,我有两个问题: 让我们使用此处所述的相同代码:

public class Program
{
    static void Main(string[] args)
    {
        List<string> inputs = new List<string>
        {
            "ABC#99999",
            "9ABC#8",
            "9ABC",
            "9ABC#"
        };

        var groups = new List<Group>();

        foreach (string input in inputs)
        {
            string[] parts = input.Split("#", StringSplitOptions.RemoveEmptyEntries);

            var group = new Group
            {
                Part1 = input
            };

            if (parts.Length == 2)
            {
                group.Part1 = parts[0];
                group.Part2 = parts[1];
            };

            groups.Add(group);

            Console.WriteLine($"Input: '{input}': {group}");
        }

        Console.ReadKey();
    }
}

public class Group
{
    public string Part1 { get; set; }
    public string Part2 { get; set; }

    /// <inheritdoc />
    public override string ToString()
    {
        return $"Part1: {Part1 ?? "null"}, Part2: {Part2 ?? "[null]"}";
    }
}
  1. class C0(object): def func0(self, arg): # added self ... def func1(self, rdd): # added self func = self.func0 result = rdd.map(lambda x: func(x))

    是同一件事

    result = rdd.map(lambda x:func(x))?特别是在我之前使用result = rdd.map(func)的情况下?

  2. 假设func0调用了该类中的另一个方法:

C0类(对象):

func = self.func0

Spark如何处理?我应该在func0内做 def func2(self, arg): ... def func0(self, arg): # added self self.func2(arg) ... def func1(self, rdd): # added self func = self.func0 result = rdd.map(lambda x: func(x)) 吗?

0 个答案:

没有答案