2015-10-30更新
基于Roland Kuhn Awnser:
Akka Streams正在使用Actors之间的异步消息传递 实现流处理阶段。传递数据 异步边界有一个你在这里看到的开销:你的 计算似乎只需要大约160ns(来自于 流媒体解决方案需要的单线程测量 每个元素大约1μs,这由消息传递决定。
另一个误解是说“流”意味着并行性:in 你的代码所有计算在一个Actor(地图)中顺序运行 阶段),因此对原始人没有任何好处 单线程解决方案。
为了从Akka Streams提供的并行性中受益 需要有多个处理阶段,每个阶段执行
的任务每个元素1μs,另见文档。
我做了一些改变。我的代码现在看起来像:
object MultiThread {
implicit val actorSystem = ActorSystem("Sys")
implicit val materializer = ActorMaterializer()
var counter = 0
var oldProgess = 0
//RunnableFlow: in -> flow -> sink
val in = Source(() => Iterator.continually((1254785478l, "name", 48, 23.09f)))
val flow = Flow[(Long, String, Int, Float)].map(p => SharedFunctions.transform2(SharedFunctions.transform(p)))
val tupleToEvent = Flow[(Long, String, Int, Float)].map(SharedFunctions.transform)
val eventToFactorial = Flow[Event].map(SharedFunctions.transform2)
val eventChef: Flow[(Long, String, Int, Float), Int, Unit] = Flow() { implicit builder =>
import FlowGraph.Implicits._
val dispatchTuple = builder.add(Balance[(Long, String, Int, Float)](4))
val mergeEvents = builder.add(Merge[Int](4))
dispatchTuple.out(0) ~> tupleToEvent ~> eventToFactorial ~> mergeEvents.in(0)
dispatchTuple.out(1) ~> tupleToEvent ~> eventToFactorial ~> mergeEvents.in(1)
dispatchTuple.out(2) ~> tupleToEvent ~> eventToFactorial ~> mergeEvents.in(2)
dispatchTuple.out(3) ~> tupleToEvent ~> eventToFactorial ~> mergeEvents.in(3)
(dispatchTuple.in, mergeEvents.out)
}
val sink = Sink.foreach[Int]{
v => counter += 1
oldProgess = SharedFunctions.printProgress(oldProgess, SharedFunctions.maxEventCount, counter,
DateTime.now.getMillis - SharedFunctions.startTime.getMillis)
if(counter == SharedFunctions.maxEventCount) endAkka()
}
def endAkka() = {
val duration = new Duration(SharedFunctions.startTime, DateTime.now)
println("Time: " + duration.getMillis + " || Data: " + counter)
actorSystem.shutdown
actorSystem.awaitTermination
System.exit(-1)
}
def main(args: Array[String]) {
println("MultiThread started: " + SharedFunctions.startTime)
in.via(flow).runWith(sink)
// in.via(eventChef).runWith(sink)
}
}
我不确定我是否得到了完全错误的东西,但是我使用akka-streams的实现仍然慢得多(现在甚至比以前更慢)但我发现的是:如果我通过做一些划分来增加工作量使用akka-streams实现更快。所以,如果我做对了(否则纠正我),我的例子似乎有太多的开销。所以如果代码必须做大量工作,你只能从akka-streams中获益?
我在scala和amp;中都比较新。阿卡流。我写了一个小测试项目,它创建了一些事件,直到一个计数器达到一个特定的数字。对于每个事件,正在计算事件的一个字段的阶乘。我实施了两次。一次使用akka-stream,一次使用akka-stream(单线程)并比较运行时。
我没想到:当我创建单个事件时,两个程序的运行时几乎相同。但是如果我创建了70,000,000个事件,那么没有akka-streams的实现会更快。以下是我的结果(以下数据基于24次测量):
使用akka-streams 的单一事件: 444(+ -13)ms
没有akka-streams的70Mio事件: 11778(+ -70)ms
所以我的问题是:发生了什么事?为什么我的akka-stream实现速度较慢?
这里是我的代码:
使用Akka实施
object MultiThread {
implicit val actorSystem = ActorSystem("Sys")
implicit val materializer = ActorMaterializer()
var counter = 0
var oldProgess = 0
//RunnableFlow: in -> flow -> sink
val in = Source(() => Iterator.continually((1254785478l, "name", 48, 23.09f)))
val flow = Flow[(Long, String, Int, Float)].map(p => SharedFunctions.transform2(SharedFunctions.transform(p)))
val sink = Sink.foreach[Int]{
v => counter += 1
oldProgess = SharedFunctions.printProgress(oldProgess, SharedFunctions.maxEventCount, counter,
DateTime.now.getMillis - SharedFunctions.startTime.getMillis)
if(counter == SharedFunctions.maxEventCount) endAkka()
}
def endAkka() = {
val duration = new Duration(SharedFunctions.startTime, DateTime.now)
println("Time: " + duration.getMillis + " || Data: " + counter)
actorSystem.shutdown
actorSystem.awaitTermination
System.exit(-1)
}
def main(args: Array[String]) {
import scala.concurrent.ExecutionContext.Implicits.global
println("MultiThread started: " + SharedFunctions.startTime)
in.via(flow).runWith(sink).onComplete(_ => endAkka())
}
}
没有Akka的实施
对象SingleThread {
def main(args: Array[String]) {
println("SingleThread started at: " + SharedFunctions.startTime)
println("0%")
val i = createEvent(0)
val duration = new Duration(SharedFunctions.startTime, DateTime.now());
println("Time: " + duration.getMillis + " || Data: " + i)
}
def createEventWorker(oldProgress: Int, count: Int, randDate: Long, name: String, age: Int, myFloat: Float): Int = {
if (count == SharedFunctions.maxEventCount) count
else {
val e = SharedFunctions.transform((randDate, name, age, myFloat))
SharedFunctions.transform2(e)
val p = SharedFunctions.printProgress(oldProgress, SharedFunctions.maxEventCount, count,
DateTime.now.getMillis - SharedFunctions.startTime.getMillis)
createEventWorker(p, count + 1, 1254785478l, "name", 48, 23.09f)
}
}
def createEvent(count: Int): Int = {
createEventWorker(0, count, 1254785478l, "name", 48, 23.09f)
}
}
SharedFunctions
object SharedFunctions {
val maxEventCount = 70000000
val startTime = DateTime.now
def transform(t : (Long, String, Int, Float)) : Event = new Event(t._1 ,t._2,t._3,t._4)
def transform2(e : Event) : Int = factorial(e.getAgeYrs)
def calculatePercentage(totalValue: Long, currentValue: Long) = Math.round((currentValue * 100) / totalValue)
def printProgress(oldProgress : Int, fileSize: Long, currentSize: Int, t: Long) = {
val cProgress = calculatePercentage(fileSize, currentSize)
if (oldProgress != cProgress) println(s"$oldProgress% | $t ms")
cProgress
}
private def factorialWorker(n1: Int, n2: Int): Int = {
if (n1 == 0) n2
else factorialWorker(n1 -1, n2*n1)
}
def factorial (n : Int): Int = {
factorialWorker(n, 1)
}
}
实施活动
/**
* Autogenerated by Avro
*
* DO NOT EDIT DIRECTLY
*/
@SuppressWarnings("all")
@org.apache.avro.specific.AvroGenerated
public class Event extends org.apache.avro.specific.SpecificRecordBase implements org.apache.avro.specific.SpecificRecord {
public static final org.apache.avro.Schema SCHEMA$ = new org.apache.avro.Schema.Parser().parse("{\"type\":\"record\",\"name\":\"Event\",\"namespace\":\"week2P2\",\"fields\":[{\"name\":\"timestampMS\",\"type\":\"long\"},{\"name\":\"name\",\"type\":\"string\"},{\"name\":\"ageYrs\",\"type\":\"int\"},{\"name\":\"sizeCm\",\"type\":\"float\"}]}");
public static org.apache.avro.Schema getClassSchema() { return SCHEMA$; }
@Deprecated public long timestampMS;
@Deprecated public CharSequence name;
@Deprecated public int ageYrs;
@Deprecated public float sizeCm;
/**
* Default constructor. Note that this does not initialize fields
* to their default values from the schema. If that is desired then
* one should use <code>newBuilder()</code>.
*/
public Event() {}
/**
* All-args constructor.
*/
public Event(Long timestampMS, CharSequence name, Integer ageYrs, Float sizeCm) {
this.timestampMS = timestampMS;
this.name = name;
this.ageYrs = ageYrs;
this.sizeCm = sizeCm;
}
public org.apache.avro.Schema getSchema() { return SCHEMA$; }
// Used by DatumWriter. Applications should not call.
public Object get(int field$) {
switch (field$) {
case 0: return timestampMS;
case 1: return name;
case 2: return ageYrs;
case 3: return sizeCm;
default: throw new org.apache.avro.AvroRuntimeException("Bad index");
}
}
// Used by DatumReader. Applications should not call.
@SuppressWarnings(value="unchecked")
public void put(int field$, Object value$) {
switch (field$) {
case 0: timestampMS = (Long)value$; break;
case 1: name = (CharSequence)value$; break;
case 2: ageYrs = (Integer)value$; break;
case 3: sizeCm = (Float)value$; break;
default: throw new org.apache.avro.AvroRuntimeException("Bad index");
}
}
/**
* Gets the value of the 'timestampMS' field.
*/
public Long getTimestampMS() {
return timestampMS;
}
/**
* Sets the value of the 'timestampMS' field.
* @param value the value to set.
*/
public void setTimestampMS(Long value) {
this.timestampMS = value;
}
/**
* Gets the value of the 'name' field.
*/
public CharSequence getName() {
return name;
}
/**
* Sets the value of the 'name' field.
* @param value the value to set.
*/
public void setName(CharSequence value) {
this.name = value;
}
/**
* Gets the value of the 'ageYrs' field.
*/
public Integer getAgeYrs() {
return ageYrs;
}
/**
* Sets the value of the 'ageYrs' field.
* @param value the value to set.
*/
public void setAgeYrs(Integer value) {
this.ageYrs = value;
}
/**
* Gets the value of the 'sizeCm' field.
*/
public Float getSizeCm() {
return sizeCm;
}
/**
* Sets the value of the 'sizeCm' field.
* @param value the value to set.
*/
public void setSizeCm(Float value) {
this.sizeCm = value;
}
/** Creates a new Event RecordBuilder */
public static Event.Builder newBuilder() {
return new Event.Builder();
}
/** Creates a new Event RecordBuilder by copying an existing Builder */
public static Event.Builder newBuilder(Event.Builder other) {
return new Event.Builder(other);
}
/** Creates a new Event RecordBuilder by copying an existing Event instance */
public static Event.Builder newBuilder(Event other) {
return new Event.Builder(other);
}
/**
* RecordBuilder for Event instances.
*/
public static class Builder extends org.apache.avro.specific.SpecificRecordBuilderBase<Event>
implements org.apache.avro.data.RecordBuilder<Event> {
private long timestampMS;
private CharSequence name;
private int ageYrs;
private float sizeCm;
/** Creates a new Builder */
private Builder() {
super(Event.SCHEMA$);
}
/** Creates a Builder by copying an existing Builder */
private Builder(Event.Builder other) {
super(other);
if (isValidValue(fields()[0], other.timestampMS)) {
this.timestampMS = data().deepCopy(fields()[0].schema(), other.timestampMS);
fieldSetFlags()[0] = true;
}
if (isValidValue(fields()[1], other.name)) {
this.name = data().deepCopy(fields()[1].schema(), other.name);
fieldSetFlags()[1] = true;
}
if (isValidValue(fields()[2], other.ageYrs)) {
this.ageYrs = data().deepCopy(fields()[2].schema(), other.ageYrs);
fieldSetFlags()[2] = true;
}
if (isValidValue(fields()[3], other.sizeCm)) {
this.sizeCm = data().deepCopy(fields()[3].schema(), other.sizeCm);
fieldSetFlags()[3] = true;
}
}
/** Creates a Builder by copying an existing Event instance */
private Builder(Event other) {
super(Event.SCHEMA$);
if (isValidValue(fields()[0], other.timestampMS)) {
this.timestampMS = data().deepCopy(fields()[0].schema(), other.timestampMS);
fieldSetFlags()[0] = true;
}
if (isValidValue(fields()[1], other.name)) {
this.name = data().deepCopy(fields()[1].schema(), other.name);
fieldSetFlags()[1] = true;
}
if (isValidValue(fields()[2], other.ageYrs)) {
this.ageYrs = data().deepCopy(fields()[2].schema(), other.ageYrs);
fieldSetFlags()[2] = true;
}
if (isValidValue(fields()[3], other.sizeCm)) {
this.sizeCm = data().deepCopy(fields()[3].schema(), other.sizeCm);
fieldSetFlags()[3] = true;
}
}
/** Gets the value of the 'timestampMS' field */
public Long getTimestampMS() {
return timestampMS;
}
/** Sets the value of the 'timestampMS' field */
public Event.Builder setTimestampMS(long value) {
validate(fields()[0], value);
this.timestampMS = value;
fieldSetFlags()[0] = true;
return this;
}
/** Checks whether the 'timestampMS' field has been set */
public boolean hasTimestampMS() {
return fieldSetFlags()[0];
}
/** Clears the value of the 'timestampMS' field */
public Event.Builder clearTimestampMS() {
fieldSetFlags()[0] = false;
return this;
}
/** Gets the value of the 'name' field */
public CharSequence getName() {
return name;
}
/** Sets the value of the 'name' field */
public Event.Builder setName(CharSequence value) {
validate(fields()[1], value);
this.name = value;
fieldSetFlags()[1] = true;
return this;
}
/** Checks whether the 'name' field has been set */
public boolean hasName() {
return fieldSetFlags()[1];
}
/** Clears the value of the 'name' field */
public Event.Builder clearName() {
name = null;
fieldSetFlags()[1] = false;
return this;
}
/** Gets the value of the 'ageYrs' field */
public Integer getAgeYrs() {
return ageYrs;
}
/** Sets the value of the 'ageYrs' field */
public Event.Builder setAgeYrs(int value) {
validate(fields()[2], value);
this.ageYrs = value;
fieldSetFlags()[2] = true;
return this;
}
/** Checks whether the 'ageYrs' field has been set */
public boolean hasAgeYrs() {
return fieldSetFlags()[2];
}
/** Clears the value of the 'ageYrs' field */
public Event.Builder clearAgeYrs() {
fieldSetFlags()[2] = false;
return this;
}
/** Gets the value of the 'sizeCm' field */
public Float getSizeCm() {
return sizeCm;
}
/** Sets the value of the 'sizeCm' field */
public Event.Builder setSizeCm(float value) {
validate(fields()[3], value);
this.sizeCm = value;
fieldSetFlags()[3] = true;
return this;
}
/** Checks whether the 'sizeCm' field has been set */
public boolean hasSizeCm() {
return fieldSetFlags()[3];
}
/** Clears the value of the 'sizeCm' field */
public Event.Builder clearSizeCm() {
fieldSetFlags()[3] = false;
return this;
}
@Override
public Event build() {
try {
Event record = new Event();
record.timestampMS = fieldSetFlags()[0] ? this.timestampMS : (Long) defaultValue(fields()[0]);
record.name = fieldSetFlags()[1] ? this.name : (CharSequence) defaultValue(fields()[1]);
record.ageYrs = fieldSetFlags()[2] ? this.ageYrs : (Integer) defaultValue(fields()[2]);
record.sizeCm = fieldSetFlags()[3] ? this.sizeCm : (Float) defaultValue(fields()[3]);
return record;
} catch (Exception e) {
throw new org.apache.avro.AvroRuntimeException(e);
}
}
}
}
答案 0 :(得分:29)
Akka Streams在Actors之间使用异步消息传递来实现流处理阶段。在异步边界上传递数据有一个开销,你在这里看到:你的计算似乎只需要大约160ns(来自单线程测量),而流解决方案每个元素需要大约1μs,这由消息传递决定。
另一个误解是说“流”意味着并行性:在你的代码中,所有计算都在一个Actor(map
阶段)中顺序运行,因此对原始单线程解决方案没有任何好处。 / p>
为了从Akka Streams提供的并行性中受益,您需要有多个处理阶段,每个阶段执行每个元素>1μs的任务,另请参阅the docs。
答案 1 :(得分:12)
除了Roland的解释,我完全赞同,应该理解akka Streams不仅仅是一个并发的编程框架。 Streams还提供背压,这意味着只有Source
需要在Sink
处理事件时才会生成事件。这种需求通信在每个处理步骤中增加了一些开销。
因此,您的单线程和多线程比较不是“苹果对苹果”。
如果你想要原始的多线程执行性能,那么Futures / Actors是一个更好的方法。