org.apache.hadoop.mapred
Class JobTracker

java.lang.Object
  extended by org.apache.hadoop.mapred.JobTracker
All Implemented Interfaces:
VersionedProtocol, JobSubmissionProtocol

public class JobTracker
extends Object
implements JobSubmissionProtocol

JobTracker is the central location for submitting and tracking MR jobs in a network environment.

Author:
Mike Cafarella

Field Summary
static int FILE_NOT_FOUND
           
static long HEARTBEAT_INTERVAL
           
static org.apache.commons.logging.Log LOG
           
static String MAP_OUTPUT_LENGTH
          The custom http header used for the map output length.
static float MAX_INMEM_FILESIZE_FRACTION
          Constant denoting the max size (in terms of the fraction of the total size of the filesys) of a map output file that we will try to keep in mem.
static float MAX_INMEM_FILESYS_USE
          Constant denoting when a merge of in memory files will be triggered
static int SUCCESS
           
static int TRACKERS_OK
           
static int UNKNOWN_TASKTRACKER
           
static long versionID
          version 3 introduced to replace emitHearbeat/pollForNewTask/pollForTaskWithClosedJob with heartbeat(TaskTrackerStatus, boolean, boolean, short) version 4 changed TaskReport for HADOOP-549.
 
Fields inherited from interface org.apache.hadoop.mapred.JobSubmissionProtocol
versionID
 
Method Summary
 Vector<org.apache.hadoop.mapred.JobInProgress> completedJobs()
           
 Vector<org.apache.hadoop.mapred.JobInProgress> failedJobs()
           
static InetSocketAddress getAddress(Configuration conf)
           
 String getAssignedTracker(String taskId)
          Get tracker name for a given task id.
 ClusterStatus getClusterStatus()
          Get the current status of the cluster
 String getFilesystemName()
          Grab the local fs name
 int getInfoPort()
           
 org.apache.hadoop.mapred.JobInProgress getJob(String jobid)
           
 Counters getJobCounters(String jobid)
          Grab the current job counters
 JobProfile getJobProfile(String jobid)
          Grab a handle to a job that is already known to the JobTracker
 JobStatus getJobStatus(String jobid)
          Grab a handle to a job that is already known to the JobTracker
 String getJobTrackerMachine()
           
 TaskReport[] getMapTaskReports(String jobid)
          Grab a bunch of info on the map tasks that make up the job
 long getProtocolVersion(String protocol, long clientVersion)
          Return protocol version corresponding to protocol interface.
 TaskReport[] getReduceTaskReports(String jobid)
          Grab a bunch of info on the reduce tasks that make up the job
 List<org.apache.hadoop.mapred.JobInProgress> getRunningJobs()
          Version that is called from a timer thread, and therefore needs to be careful to synchronize.
 long getStartTime()
           
 TaskCompletionEvent[] getTaskCompletionEvents(String jobid, int fromEventId, int maxEvents)
          Get task completion events for the jobid, starting from fromEventId.
 List<String> getTaskDiagnostics(String jobId, String tipId, String taskId)
          Get the diagnostics for a given task
 org.apache.hadoop.mapred.TaskTrackerStatus getTaskTracker(String trackerID)
           
 int getTotalSubmissions()
           
static JobTracker getTracker()
           
 int getTrackerPort()
           
 org.apache.hadoop.mapred.HeartbeatResponse heartbeat(org.apache.hadoop.mapred.TaskTrackerStatus status, boolean initialContact, boolean acceptNewTasks, short responseId)
          The periodic heartbeat mechanism between the TaskTracker and the JobTracker.
 JobStatus[] jobsToComplete()
          Get the jobs that are not completed and not failed
 void killJob(String jobid)
          Kill the indicated job
static void main(String[] argv)
          Start the JobTracker process.
 void offerService()
          Run forever
 void reportTaskTrackerError(String taskTracker, String errorClass, String errorMessage)
          Report a problem to the job tracker.
 Vector<org.apache.hadoop.mapred.JobInProgress> runningJobs()
           
static void startTracker(Configuration conf)
          Start the JobTracker with given configuration.
static void stopTracker()
           
 JobStatus submitJob(String jobFile)
          JobTracker.submitJob() kicks off a new job.
 Collection taskTrackers()
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

LOG

public static final org.apache.commons.logging.Log LOG

HEARTBEAT_INTERVAL

public static final long HEARTBEAT_INTERVAL
See Also:
Constant Field Values

MAX_INMEM_FILESYS_USE

public static final float MAX_INMEM_FILESYS_USE
Constant denoting when a merge of in memory files will be triggered

See Also:
Constant Field Values

MAX_INMEM_FILESIZE_FRACTION

public static final float MAX_INMEM_FILESIZE_FRACTION
Constant denoting the max size (in terms of the fraction of the total size of the filesys) of a map output file that we will try to keep in mem. Ideally, this should be a factor of MAX_INMEM_FILESYS_USE

See Also:
Constant Field Values

SUCCESS

public static final int SUCCESS
See Also:
Constant Field Values

FILE_NOT_FOUND

public static final int FILE_NOT_FOUND
See Also:
Constant Field Values

MAP_OUTPUT_LENGTH

public static final String MAP_OUTPUT_LENGTH
The custom http header used for the map output length.

See Also:
Constant Field Values

versionID

public static final long versionID
version 3 introduced to replace emitHearbeat/pollForNewTask/pollForTaskWithClosedJob with heartbeat(TaskTrackerStatus, boolean, boolean, short) version 4 changed TaskReport for HADOOP-549. version 5 introduced that removes locateMapOutputs and instead uses getTaskCompletionEvents to figure finished maps and fetch the outputs

See Also:
Constant Field Values

TRACKERS_OK

public static final int TRACKERS_OK
See Also:
Constant Field Values

UNKNOWN_TASKTRACKER

public static final int UNKNOWN_TASKTRACKER
See Also:
Constant Field Values
Method Detail

startTracker

public static void startTracker(Configuration conf)
                         throws IOException
Start the JobTracker with given configuration. The conf will be modified to reflect the actual ports on which the JobTracker is up and running if the user passes the port as zero.

Parameters:
conf - configuration for the JobTracker.
Throws:
IOException

getTracker

public static JobTracker getTracker()

stopTracker

public static void stopTracker()
                        throws IOException
Throws:
IOException

getProtocolVersion

public long getProtocolVersion(String protocol,
                               long clientVersion)
                        throws IOException
Description copied from interface: VersionedProtocol
Return protocol version corresponding to protocol interface.

Specified by:
getProtocolVersion in interface VersionedProtocol
Parameters:
protocol - The classname of the protocol interface
clientVersion - The version of the protocol that the client speaks
Returns:
the version that the server will speak
Throws:
IOException

getAddress

public static InetSocketAddress getAddress(Configuration conf)

offerService

public void offerService()
Run forever


getTotalSubmissions

public int getTotalSubmissions()

getJobTrackerMachine

public String getJobTrackerMachine()

getTrackerPort

public int getTrackerPort()

getInfoPort

public int getInfoPort()

getStartTime

public long getStartTime()

runningJobs

public Vector<org.apache.hadoop.mapred.JobInProgress> runningJobs()

getRunningJobs

public List<org.apache.hadoop.mapred.JobInProgress> getRunningJobs()
Version that is called from a timer thread, and therefore needs to be careful to synchronize.


failedJobs

public Vector<org.apache.hadoop.mapred.JobInProgress> failedJobs()

completedJobs

public Vector<org.apache.hadoop.mapred.JobInProgress> completedJobs()

taskTrackers

public Collection taskTrackers()

getTaskTracker

public org.apache.hadoop.mapred.TaskTrackerStatus getTaskTracker(String trackerID)

heartbeat

public org.apache.hadoop.mapred.HeartbeatResponse heartbeat(org.apache.hadoop.mapred.TaskTrackerStatus status,
                                                            boolean initialContact,
                                                            boolean acceptNewTasks,
                                                            short responseId)
                                                     throws IOException
The periodic heartbeat mechanism between the TaskTracker and the JobTracker. The JobTracker processes the status information sent by the TaskTracker and responds with instructions to start/stop tasks or jobs, and also 'reset' instructions during contingencies.

Parameters:
status - the status update
initialContact - true if this is first interaction since 'refresh', false otherwise.
acceptNewTasks - true if the TaskTracker is ready to accept new tasks to run.
responseId - the last responseId successfully acted upon by the TaskTracker.
Returns:
a HeartbeatResponse with fresh instructions.
Throws:
IOException

getFilesystemName

public String getFilesystemName()
                         throws IOException
Grab the local fs name

Specified by:
getFilesystemName in interface JobSubmissionProtocol
Throws:
IOException

reportTaskTrackerError

public void reportTaskTrackerError(String taskTracker,
                                   String errorClass,
                                   String errorMessage)
                            throws IOException
Report a problem to the job tracker.

Parameters:
taskTracker - the name of the task tracker
errorClass - the kind of error (eg. the class that was thrown)
errorMessage - the human readable error message
Throws:
IOException - if there was a problem in communication or on the remote side

submitJob

public JobStatus submitJob(String jobFile)
                    throws IOException
JobTracker.submitJob() kicks off a new job. Create a 'JobInProgress' object, which contains both JobProfile and JobStatus. Those two sub-objects are sometimes shipped outside of the JobTracker. But JobInProgress adds info that's useful for the JobTracker alone. We add the JIP to the jobInitQueue, which is processed asynchronously to handle split-computation and build up the right TaskTracker/Block mapping.

Specified by:
submitJob in interface JobSubmissionProtocol
Throws:
IOException

getClusterStatus

public ClusterStatus getClusterStatus()
Description copied from interface: JobSubmissionProtocol
Get the current status of the cluster

Specified by:
getClusterStatus in interface JobSubmissionProtocol
Returns:
summary of the state of the cluster

killJob

public void killJob(String jobid)
Description copied from interface: JobSubmissionProtocol
Kill the indicated job

Specified by:
killJob in interface JobSubmissionProtocol

getJobProfile

public JobProfile getJobProfile(String jobid)
Description copied from interface: JobSubmissionProtocol
Grab a handle to a job that is already known to the JobTracker

Specified by:
getJobProfile in interface JobSubmissionProtocol

getJobStatus

public JobStatus getJobStatus(String jobid)
Description copied from interface: JobSubmissionProtocol
Grab a handle to a job that is already known to the JobTracker

Specified by:
getJobStatus in interface JobSubmissionProtocol

getJobCounters

public Counters getJobCounters(String jobid)
Description copied from interface: JobSubmissionProtocol
Grab the current job counters

Specified by:
getJobCounters in interface JobSubmissionProtocol

getMapTaskReports

public TaskReport[] getMapTaskReports(String jobid)
Description copied from interface: JobSubmissionProtocol
Grab a bunch of info on the map tasks that make up the job

Specified by:
getMapTaskReports in interface JobSubmissionProtocol

getReduceTaskReports

public TaskReport[] getReduceTaskReports(String jobid)
Description copied from interface: JobSubmissionProtocol
Grab a bunch of info on the reduce tasks that make up the job

Specified by:
getReduceTaskReports in interface JobSubmissionProtocol

getTaskCompletionEvents

public TaskCompletionEvent[] getTaskCompletionEvents(String jobid,
                                                     int fromEventId,
                                                     int maxEvents)
                                              throws IOException
Get task completion events for the jobid, starting from fromEventId. Returns empty aray if no events are available.

Specified by:
getTaskCompletionEvents in interface JobSubmissionProtocol
Parameters:
jobid - job id
fromEventId - event id to start from.
maxEvents - the max number of events we want to look at
Returns:
array of task completion events.
Throws:
IOException

getTaskDiagnostics

public List<String> getTaskDiagnostics(String jobId,
                                       String tipId,
                                       String taskId)
Get the diagnostics for a given task

Parameters:
jobId - the id of the job
tipId - the id of the tip
taskId - the id of the task
Returns:
a list of the diagnostic messages

getAssignedTracker

public String getAssignedTracker(String taskId)
Get tracker name for a given task id.

Parameters:
taskId - the name of the task
Returns:
The name of the task tracker

jobsToComplete

public JobStatus[] jobsToComplete()
Description copied from interface: JobSubmissionProtocol
Get the jobs that are not completed and not failed

Specified by:
jobsToComplete in interface JobSubmissionProtocol
Returns:
array of JobStatus for the running/to-be-run jobs.

getJob

public org.apache.hadoop.mapred.JobInProgress getJob(String jobid)

main

public static void main(String[] argv)
                 throws IOException,
                        InterruptedException
Start the JobTracker process. This is used only for debugging. As a rule, JobTracker should be run as part of the DFS Namenode process.

Throws:
IOException
InterruptedException


Copyright © 2006 The Apache Software Foundation