无法重新分区DStream

时间:2017-07-02 13:10:54

标签: spark-streaming apache-spark-2.0

C:\Users\tutzy\Desktop\newtest\amir1>env\scripts\python manage.py createsuperuser
Traceback (most recent call last):
  File "manage.py", line 17, in <module>
    execute_from_command_line(sys.argv)
  File "C:\Users\tutzy\Desktop\newtest\amir1\env\lib\site-packages\django\core\management\__init__.py", line 353, in execute_from_command_line
    utility.execute()
  File "C:\Users\tutzy\Desktop\newtest\amir1\env\lib\site-packages\django\core\management\__init__.py", line 345, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "C:\Users\tutzy\Desktop\newtest\amir1\env\lib\site-packages\django\core\management\base.py", line 348, in run_from_argv
    self.execute(*args, **cmd_options)
  File "C:\Users\tutzy\Desktop\newtest\amir1\env\lib\site-packages\django\contrib\auth\management\commands\createsuperuser.py", line 52, in execute
    return super(Command, self).execute(*args, **options)
  File "C:\Users\tutzy\Desktop\newtest\amir1\env\lib\site-packages\django\core\management\base.py", line 413, in execute
    translation.activate(saved_locale)
  File "C:\Users\tutzy\Desktop\newtest\amir1\env\lib\site-packages\django\utils\translation\__init__.py", line 154, in activate
    return _trans.activate(language)
  File "C:\Users\tutzy\Desktop\newtest\amir1\env\lib\site-packages\django\utils\translation\trans_real.py", line 216, in activate
    _active.value = translation(language)
  File "C:\Users\tutzy\Desktop\newtest\amir1\env\lib\site-packages\django\utils\translation\trans_real.py", line 205, in translation
    _translations[language] = DjangoTranslation(language)
  File "C:\Users\tutzy\Desktop\newtest\amir1\env\lib\site-packages\django\utils\translation\trans_real.py", line 118, in __init__
    raise IOError("No translation files found for default language %s." % settings.LANGUAGE_CODE)
IOError: No translation files found for default language en-us.

我无法使用上面的代码对DStream进行重新分区,我的输入有128个分区,这是no。 Kafka partitons,并且由于Join我需要随机读取和写入数据,所以我想通过增加no-of分区来增加并行性。但是分区保持不变。为什么会这样? enter image description here

1 个答案:

答案 0 :(得分:1)

就像mapfilter一样,repartition是Spark中的转换,意味着有三件事:

  • 它返回另一个不可变的RDD
  • 这是懒惰的
  • 需要通过某种行动来实现

考虑此代码:

dstream_1.foreachRDD(r => r.repartition(500))

repartition中使用foreacRDD作为副作用不起作用。结果RDD从未使用过,因此重新分区永远不会发生。

我们应该将这种转变与工作中的其他操作“链接”起来。在这种情况下,实现这一目标的一种简单方法是使用transform代替:

val repartitionedDStream = dstream_1.transform(rdd => rdd.repartition(500))
... use repartitionedDStream further on ...