ResultTask¶
ResultTask
is a scheduler:Task.md[Task] that <
<DAGScheduler
submits missing tasks for a ResultStage
].
ResultTask
is created with a <
[[internal-registries]] .ResultTask's Internal Registries and Counters [cols="1,2",options="header",width="100%"] |=== | Name | Description
| [[preferredLocs]] preferredLocs
| Collection of scheduler:TaskLocation.md[TaskLocations].
Corresponds directly to unique entries in <locs
is not defined, it is empty, and no task location preferences are defined.
Initialized when <
Used exclusively when ResultTask
is requested for <
|===
Creating Instance¶
ResultTask
takes the following when created:
stageId
-- the stage the task is executed forstageAttemptId
-- the stage attempt id- [[taskBinary]] Broadcast.md[] with the serialized task (as
Array[Byte]
). The broadcast contains of a serialized pair ofRDD
and the function to execute. - [[partition]] Partition to compute
- [[locs]] Collection of scheduler:TaskLocation.md[TaskLocations], i.e. preferred locations (executors) to execute the task on
- [[outputId]]
outputId
- [[localProperties]] local
Properties
- [[serializedTaskMetrics]] The stage's serialized executor:TaskMetrics.md[] (as
Array[Byte]
) - [[jobId]] (optional) spark-scheduler-ActiveJob.md[Job] id
- [[appId]] (optional) Application id
- [[appAttemptId]] (optional) Application attempt id
ResultTask
initializes the <
=== [[preferredLocations]] preferredLocations
Method
[source, scala]¶
preferredLocations: Seq[TaskLocation]¶
NOTE: preferredLocations
is part of scheduler:Task.md#contract[Task contract].
preferredLocations
simply returns <
=== [[runTask]] Deserialize RDD and Function (From Broadcast) and Execute Function (on RDD Partition) -- runTask
Method
[source, scala]¶
runTask(context: TaskContext): U¶
NOTE: U
is the type of a result as defined when <
runTask
deserializes a RDD and a function from the <
NOTE: runTask
is part of scheduler:Task.md#contract[Task contract] to run a task.
Internally, runTask
starts by tracking the time required to deserialize a RDD and a function to execute.
runTask
serializer:Serializer.md#newInstance[creates a new closure Serializer
].
NOTE: runTask
uses core:SparkEnv.md#closureSerializer[SparkEnv
to access the current closure Serializer
].
runTask
serializer:Serializer.md#deserialize[requests the closure Serializer
to deserialize an RDD
and the function to execute] (from <
NOTE: <
runTask
records scheduler:Task.md#_executorDeserializeTime[_executorDeserializeTime] and scheduler:Task.md#_executorDeserializeCpuTime[_executorDeserializeCpuTime] properties.
In the end, runTask
executes the function (passing in the input context
and the rdd:index.md#iterator[records from partition
of the RDD]).
NOTE: partition
to use to access the records in a deserialized RDD is defined when <