NAME
queue_conf - Grid Engine queue configuration file format
DESCRIPTION
Queue_conf reflects the format of the template file for the
queue configuration. Via the -aq and -mq options of the
qconf(1) command, you can add queues and modify the confi-
guration of any queue in the cluster.
The queue_conf parameters take as values strings, integer
decimal numbers or boolean, time and memory specifiers as
well as comma separated lists. A time specifier either con-
sists of a positive decimal, hexadecimal or octal integer
constant, in which case the value is interpreted to be in
seconds, or is built by 3 decimal integer numbers separated
by colon signs where the first number counts the hours, the
second the minutes and the third the seconds. If a number
would be zero it can be left out but the separating colon
must remain (e.g. 1:0:1 = 1::1 means 1 hours and 1 second).
Memory specifiers are positive decimal, hexadecimal or octal
integer constants which may be followed by a multiplier
letter. Valid multiplier letters are k, K, m and M, where k
means multiply the value by 1000, K multiply by 1024, m mul-
tiply by 1000*1000 and M multiplies by 1024*1024. If no mul-
tiplier is present, the value is just counted in bytes.
FORMAT
The following list of queue_conf parameters specifies the
queue_conf content:
qname
The name of the queue on the node (type string; template
default: template).
hostname
The fully-qualified host name of the node (type string; tem-
plate default: host.dom.dom.dom).
seq_no
With sort_seq_no (see sched_conf(5)) set to TRUE, this
parameter specifies this queue's position in the scheduling
order within the suitable queues for a job to be dispatched.
It thus replaces the order by load policy that would rule
otherwise.
Regardless of the sort_seq_no setting, qstat(1) reports
queue information in the order defined by the value of the
seq_no. Set this parameter to a monotonically increasing
sequence. The type is number and the default is 0.
load_thresholds
load_thresholds is a list of load thresholds. Already if one
of the thresholds is exceeded no further jobs will be
scheduled to the queues on this node and qmon(1) will signal
an overload condition for this node. Arbitrary load values
being defined in the "host" and "global" complexes (see com-
plex(5) for details) can be used.
The syntax is that of a comma separated list with each list
element consisting of the name of a load value, an equal
sign and the threshold value being intended to trigger the
overload situation (e.g. load.avg=175,users_logged_in=5).
Note: Load values as well as consumable resources may be
scaled differently for different hosts if specified in the
corresponding execution host definitions (refer to
host_conf(5) for more information). Load thresholds are com-
pared against the scaled load and consumable values.
suspend_thresholds
A list of load thresholds with the same semantics as that of
the load_thresholds parameter (see above) except that
exceeding one of the denoted thresholds initiates suspension
of one of multiple jobs in the queue. See the nsuspend
parameter below for details on the number of jobs which are
suspended.
nsuspend
The number of jobs which are suspended/enabled per time
interval if at least one of the load thresholds in the
suspend_thresholds list is exceeded or if no
suspend_threshold is violated anymore respectively.
Nsuspend jobs are suspended in each time interval until no
suspend_thresholds are exceeded anymore or all jobs in the
queue are suspended. Jobs are enabled in the corresponding
way if the suspend_thresholds are no longer exceeded. The
time interval in which the suspensions of the jobs occur is
defined in suspend_interval below.
suspend_interval
The time interval in which further nsuspend jobs are
suspended if one of the suspend_thresholds (see above for
both) is exceeded by the current load on the host on which
the queue is located. The time interval is also used when
enabling the jobs.
migr_load_thresholds
A list of load thresholds with the same semantics as that of
the load_thresholds parameter (see above) except that
exceeding one of the denoted thresholds initiates migration
of the jobs from the queue. This parameter has no effect in
this release.
priority
The priority parameter specifies the nice(2) value at which
jobs in this queue will be run. The type is number and the
default is zero (which means no nice value is set expli-
citly).
max_migr_time
The time reserved for checkpointing jobs to be migrated and
aborted. Checkpointing jobs due to be aborted are first
sent a SIGTSTP. Everyone in the concerned process group may
catch this signal and may react appropriately. After
max_migr_time seconds, a SIGKILL is sent and the processes
are aborted. Note: If you set max_migr_time too high a user
requesting full interactive usage may suffer max_migr_time
seconds from a still running job. Max_migr_time is of type
time and The default is 0 seconds.
migr_load_thresholds
A list of load thresholds with the same semantics as that of
the load_thresholds parameter (see above) except that
exceeding one of the denoted thresholds initiates migration
of checkpointing jobs from the queue. It is recommended to
set the migration load values high enough above the
load_thresholds to prevent the jobs from forcing migrations
by their own activity.
max_no_migr
The time a checkpointing job is allowed to spend in non-
interruptible sections of the batch script. Non-
interruptible sections are everything outside qrestart(1)
commands. If a job exceeds this time limit it is killed and
the job owner is notified. The default for max_no_migr is 2
minutes. It is of type time.
min_cpu_interval
The time between two automatic checkpoints in case of tran-
sparently checkpointing jobs. The maximum of the time
requested by the user via qsub(1) and the time defined by
the queue configuration is used as checkpoint interval.
Since checkpoint files may be considerably large and thus
writing them to the file system may become expensive, users
and administrators are advised to choose sufficiently large
time intervals. min_cpu_interval is of type time and the
default is 5 minutes (which usually is suitable for test
purposes only).
processors
A set of processors in case of a multiprocessor execution
host can be defined to which the jobs executing in this
queue are bound. The value type of this parameter is a range
description like that of the -pe option of qsub(1) (e.g. 1-
4,8,10) denoting the processor numbers for the processor
group to be used. Obviously the interpretation of these
values relies on operating system specifics and is thus per-
formed inside sge_execd(8) running on the queue host. There-
fore, the parsing of the parameter has to be provided by the
execution daemon and the parameter is only passed through
sge_qmaster(8) as a string.
Currently, support is only provided for SGI multiprocessor
machines running IRIX 6.2 and Digital UNIX multiprocessor
machines. In the case of Digital UNIX only one job per pro-
cessor set is allowed to execute at the same time, i.e.
slots (see above) should be set to 1 for this queue.
qtype
The type of queue. Currently one of batch, interactive,
parallel or checkpointing or any combination in a comma
separated list. Alternatively, if the Grid Engine Queuing
System Interface (QSI) option is licensed, the type transfer
can be specified to indicate a queue which passes jobs on to
a foreign queuing system. Deprecated in the current version.
(type string; default: batch).
rerun
Defines a default behavior for jobs which are aborted by
system crashes or manual "violent" (via kill(1)) shutdown of
the complete Grid Engine system (including the
sge_shepherd(8) of the jobs and their process hierarchy) on
the queue host. As soon as sge_execd(8) is restarted and
detects that a job has been aborted for such reasons it can
be restarted if the jobs are restartable. A job may not be
restartable, for example, if it updates databases (first
reads then writes to the same record of a database/file)
because the abortion of the job may have left the database
in an inconsistent state. If the owner of a job wants to
overrule the default behavior for the jobs in the queue the
-r option of qsub(1) can be used.
The type of this parameter is boolean, thus either TRUE or
FALSE can be specified. The default is FALSE, i.e. do not
restart jobs automatically.
slots
The maximum number of concurrently executing jobs allowed in
the queue. Type is number.
tmpdir
The tmpdir parameter specifies the absolute path to the base
of the temporary directory filesystem. When sge_execd(8)
launches a job, it creates a uniquely-named directory in
this filesystem for the purpose of holding scratch files
during job execution. At job completion, this directory and
its contents are removed automatically. The environment
variables TMPDIR and TMP are set to the path of each jobs
scratch directory (type string; default: /tmp).
shell
If either posix_compliant or script_from_stdin is specified
as the shell_start_mode parameter in sge_conf(5) the shell
parameter specifies the executable path of the command
interpreter (e.g. sh(1) or csh(1)) to be used to process
the job scripts executed in the queue. The definition of
shell can be overruled by the job owner via the qsub(1) -S
option.
The type of the parameter is string. The default is
/bin/csh.
shell_start_mode
This parameter defines the mechanisms which are used to
actually invoke the job scripts on the execution hosts. The
following values are recognized:
unix_behavior
If a user starts a job shell script under UNIX interac-
tively by invoking it just with the script name the
operating system's executable loader uses the informa-
tion provided in a comment such as `#!/bin/csh' in the
first line of the script to detect which command inter-
preter to start to interpret the script. This mechanism
is used by Grid Engine when starting jobs if
unix_behavior is defined as shell_start_mode.
posix_compliant
POSIX does not consider first script line comments such
a `#!/bin/csh' as being significant. The POSIX standard
for batch queuing systems (P1003.2d) therefore requires
a compliant queuing system to ignore such lines but to
use user specified or configured default command inter-
preters instead. Thus, if shell_start_mode is set to
posix_compliant Grid Engine will either use the command
interpreter indicated by the -S option of the qsub(1)
command or the shell parameter of the queue to be used
(see above).
script_from_stdin
Setting the shell_start_mode parameter either to
posix_compliant or unix_behavior requires you to set
the umask in use for sge_execd(8) such that every user
has read access to the active_jobs directory in the
spool directory of the corresponding execution daemon.
In case you have prolog and epilog scripts configured,
they also need to be readable by any user who may exe-
cute jobs.
If this violates your site's security policies you may
want to set shell_start_mode to script_from_stdin. This
will force Grid Engine to open the job script as well
as the epilogue and prologue scripts for reading into
STDIN as root (if sge_execd(8) was started as root)
before changing to the job owner's user account. The
script is then fed into the STDIN stream of the command
interpreter indicated by the -S option of the qsub(1)
command or the shell parameter of the queue to be used
(see above).
Thus setting shell_start_mode to script_from_stdin also
implies posix_compliant behavior. Note, however, that
feeding scripts into the STDIN stream of a command
interpreter may cause trouble if commands like rsh(1)
are invoked inside a job script as they also process
the STDIN stream of the command interpreter. These
problems can usually be resolved by redirecting the
STDIN channel of those commands to come from /dev/null
(e.g. rsh host date < /dev/null). Note also, that any
command-line options associated with the job are passed
to the executing shell. The shell will only forward
them to the job if they are not recognized as valid
shell options.
The default for shell_start_mode is posix_compliant.
klog
The executable path of the klog utility on the queue host.
It is used for AFS reauthentication. The type of the parame-
ter is string; the default is /usr/local/bin/klog.
Not supported in this release.
prolog
The executable path of a shell script that is started before
execution of Grid Engine jobs with the same environment set-
ting as that for the Grid Engine jobs to be started after-
wards. An optional prefix "user@" specifies the user under
which this procedure is to be started. This procedure is
intended as a means for the Grid Engine administrator to
automate the execution of general site specific tasks like
the preparation of temporary file systems with the need for
the same context information as the job. This queue confi-
guration entry overwrites cluster global or execution host
specific prolog definitions (see sge_conf(5)).
Note: prolog is executed exactly as the job script. There-
fore, all implications described under the parameters
shell_start_mode and login_shells below apply.
The default for prolog is the special value NONE, which
prevents from execution of a prologue script. The special
variables for constituting a command line are the same like
in prolog definitions of the cluster configuration (see
sge_conf(5)).
epilog
The executable path of a shell script that is started after
execution of Grid Engine jobs with the same environment set-
ting as that for the Grid Engine jobs that has just com-
pleted. An optional prefix "user@" specifies the user under
which this procedure is to be started. This procedure is
intended as a means for the Grid Engine administrator to
automate the execution of general site specific tasks like
the cleaning up of temporary file systems with the need for
the same context information as the job. This queue confi-
guration entry overwrites cluster global or execution host
specific epilog definitions (see sge_conf(5)).
Note: epilog is executed exactly as the job script. There-
fore, all implications described under the parameters
shell_start_mode and login_shells below apply.
The default for epilog is the special value NONE, which
prevents from execution of a epilogue script. The special
variables for constituting a command line are the same like
in prolog definitions of the cluster configuration (see
sge_conf(5)).
starter_method
The executable path given here is intended to be used as a
starter facility which is responsible for starting the job
itself.
Not supported in this release.
suspend_method
resume_method
terminate_method
These parameters can be used for overwriting the default
method used by Grid Engine for suspension, release of a
suspension and for termination of a job. Per default, the
signals SIGSTOP, SIGCONT and SIGKILL are delivered to the
job to perform these actions. However, for some applications
this is not appropriate.
If no executable path is given, Grid Engine takes the speci-
fied parameter entries as the signal to be delivered instead
of the default signal. A signal must be either a positive
number or a signal name with "SIG" as prefix and the signal
name as printed by kill -l (e.g. SIGTERM).
If an executable path is given (it must be an absolute path
starting with a "/") then this command together with its
arguments is started by Grid Engine to perform the appropri-
ate action. The following special variables are expanded at
runtime and can be used (besides any other strings which
have to be interpreted by the procedures) to constitute a
command line:
$host
The name of the host on which the procedure is started.
$job_owner
The user name of the job owner.
$job_id
Grid Engine's unique job identification number.
$job_name
The name of the job.
$queue
The name of the queue.
$job_pid
The pid of the job.
reauth_time
The time gap between consecutive AFS reauthentications.
Reauth_time should be less than the ticket expiration time
that is configured for the local AFS installation. The type
of the parameter is time and the default value is 1 hour and
40 minutes, i.e. 100 minutes.
Not supported in this release.
notify
The time waited between delivery of SIGUSR1/SIGUSR2 notifi-
cation signals and suspend/kill signals if job was submitted
with the qsub(1) -notify option.
owner_list
The owner_list names the login names (in a comma separated
list) of those users who are authorized to suspend this
queue (Grid Engine operators and managers can suspend queues
by default). It is customary to set this field for queues on
interactive workstations where the computing resources are
shared between interactive sessions and Grid Engine jobs,
allowing the workstation owner to have priority access (type
string; default: NONE).
user_lists
The user_lists parameter contains a comma separated list of
so called user access lists as described in access_list(5).
Each user contained in at least one of the enlisted access
lists has access to the queue. If the user_lists parameter
is set to NONE (the default) any user has access being not
explicitly excluded via the xuser_lists parameter described
below. If a user is contained both in an access list
enlisted in xuser_lists and user_lists the user is denied
access to the queue.
xuser_lists
The xuser_lists parameter contains a comma separated list of
so called user access lists as described in access_list(5).
Each user contained in at least one of the enlisted access
lists is not allowed to access the queue. If the xuser_lists
parameter is set to NONE (the default) any user has access.
If a user is contained both in an access list enlisted in
xuser_lists and user_lists the user is denied access to the
queue.
projects
The projects parameter contains a comma separated list of
projects that have access to the queue. Any projects not in
this list are denied access to the queue. If set to NONE
(the default), any project has access that is not specifi-
cally excluded via the xprojects parameter described below.
If a project is in both the projects and xprojects parame-
ters, the project is denied access to the queue. This
parameter is only available in a Grid Engine Enterprise Edi-
tion system.
xprojects
The xprojects parameter contains a comma separated list of
projects that are denied access to the queue. If set to NONE
(the default), no projects are denied access other than
those denied access based on the projects parameter
described above. If a project is in both the projects and
xprojects parameters, the project is denied access to the
queue. This parameter is only available in a Grid Engine
Enterprise Edition system.
subordinate_list
A list of Grid Engine queues, residing on the same host as
the configured queue, to suspend when a specified count of
jobs is running in this queue. The list specification is
the same as that of the load_thresholds parameter above,
e.g. low_pri_q=5,small_q. The numbers denote the job slots
of the queue that have to be filled to trigger the suspen-
sion of the subordinated queue. If no value is assigned a
suspension is triggered if all slots of the queue are
filled.
On nodes which host more than one queue, you might wish to
accord better service to certain classes of jobs (e.g.,
queues that are dedicated to parallel processing might need
priority over low priority production queues; default:
NONE).
complex_list
The comma separated list of administrator defined complexes
(see complex(5) for details) to be associated with the
queue. Only complex attributes contained in the enlisted
complexes and those from the "global", "host" and "queue"
complex, which are implicitly attached to each queue, can be
used in the complex_values list below.
The default value for this parameter is NONE, i.e. no
administrator defined complexes are associated with the
queue.
complex_values
complex_values defines quotas for resource attributes
managed via this queue. The allowed complex attributes to
appear in complex_values are defined by complex_list (see
above). The syntax is the same as for load_thresholds (see
above). The quotas are related to the resource consumption
of all jobs in a queue in the case of consumable resources
(see complex(5) for details on consumable resources) or they
are interpreted on a per queue slot (see slots above) basis
in the case of non-consumable resources. Consumable resource
attributes are commonly used to manage free memory, free
disk space or available floating software licenses while
non-consumable attributes usually define distinctive charac-
teristics like type of hardware installed.
For consumable resource attributes an available resource
amount is determined by subtracting the current resource
consumption of all running jobs in the queue from the quota
in the complex_values list. Jobs can only be dispatched to a
queue if no resource requests exceed any corresponding
resource availability obtained by this scheme. The quota
definition in the complex_values list is automatically
replaced by the current load value reported for this attri-
bute, if load is monitored for this resource and if the
reported load value is more stringent than the quota. This
effectively avoids oversubscription of resources.
Note: Load values replacing the quota specifications may
have become more stringent because they have been scaled
(see host_conf(5)) and/or load adjusted (see sched_conf(5)).
The -F option of qstat(1) and the load display in the
qmon(1) queue control dialog (activated by clicking on a
queue icon while the "Shift" key is pressed) provide
detailed information on the actual availability of
consumable resources and on the origin of the values taken
into account currently.
Note also: The resource consumption of running jobs (used
for the availability calculation) as well as the resource
requests of the jobs waiting to be dispatched either may be
derived from explicit user requests during job submission
(see the -l option to qsub(1)) or from a "default" value
configured for an attribute by the administrator (see com-
plex(5)). The -r option to qstat(1) can be used for
retrieving full detail on the actual resource requests of
all jobs in the system.
For non-consumable resources Grid Engine simply compares the
job's attribute requests with the corresponding specifica-
tion in complex_values taking the relation operator of the
complex attribute definition into account (see complex(5)).
If the result of the comparison is "true", the queue is
suitable for the job with respect to the particular attri-
bute. For parallel jobs each queue slot to be occupied by a
parallel task is meant to provide the same resource attri-
bute value.
Note: Only numeric complex attributes can be defined as con-
sumable resources and hence non-numeric attributes are
always handled on a per queue slot basis.
The default value for this parameter is NONE, i.e. no
administrator defined resource attribute quotas are associ-
ated with the queue.
calendar
specifies the calendar to be valid for this queue or con-
tains NONE (the default). A calendar defines the availabil-
ity of a queue depending on time of day, week and year.
Please refer to calendar_conf(5) for details on the Grid
Engine calendar facility.
Note: Jobs can request queues with a certain calendar model
via a "-l c=<cal_name>" option to qsub(1).
initial_state
defines an initial state for the queue either when adding
the queue to the system for the first time or on start-up of
the sge_execd(8) on the host on which the queue resides.
Possible values are:
default The queue is enabled when adding the queue or is
reset to the previous status when sge_execd(8)
comes up (this corresponds to the behavior in ear-
lier Grid Engine releases not supporting
initial_state).
enabled The queue is enabled in either case. This is
equivalent to a manual and explicit 'qmod -e' com-
mand (see qmod(1)).
disabled The queue is disable in either case. This is
equivalent to a manual and explicit 'qmod -d' com-
mand (see qmod(1)).
fshare
This parameter is only available in a Grid Engine Enterprise
Edition system. Grid Engine does not support this parameter.
The functional shares of the queue (i.e. job class). Jobs
executing in this queue may get functional tickets derived
from the relative importance of the queue if the functional
policy is activated.
oticket
This parameter is only available in a Grid Engine Enterprise
Edition system. Grid Engine does not support this parameter.
The override tickets of the queue (i.e. job class). Grid
Engine Enterprise Edition distributes the configured amount
of override tickets among all jobs executing in this queue.
RESOURCE LIMITS
The first two resource limit parameters, s_rt and h_rt, are
implemented by Grid Engine. They define the "real time" or
also called "elapsed" or "wall clock" time having passed
since the start of the job. If h_rt is exceeded by a job
running in the queue, it is aborted via the SIGKILL signal
(see kill(1)). If s_rt is exceeded, the job is first
"warned" via the SIGUSR1 signal (which can be caught by the
job) and finally aborted after the notification time defined
in the queue configuration parameter notify (see above) has
passed.
The resource limit parameters s_cpu and h_cpu are imple-
mented by Grid Engine as a job limit. They impose a limit on
the amount of combined CPU time consumed by all the
processes in the job. If h_cpu is exceeded by a job running
in the queue, it is aborted via a SIGKILL signal (see
kill(1)). If s_cpu is exceeded, the job is sent a SIGXCPU
signal which can be caught by the job. If you wish to allow
a job to be "warned" so it can exit gracefully before it is
killed then you should set the s_cpu limit to a lower value
than h_cpu. For parallel processes, the limit is applied
per slot which means that the limit is multiplied by the
number of slots being used by the job before being applied.
The resource limit parameters s_vmem and h_vmem are imple-
mented by Grid Engine as a job limit. They impose a limit on
the amount of combined virtual memory consumed by all the
processes in the job. If h_vmem is exceeded by a job running
in the queue, it is aborted via a SIGKILL signal (see
kill(1)). If s_vmem is exceeded, the job is sent a SIGXCPU
signal which can be caught by the job. If you wish to allow
a job to be "warned" so it can exit gracefully before it is
killed then you should set the s_vmem limit to a lower value
than h_vmem. For parallel processes, the limit is applied
per slot which means that the limit is multiplied by the
number of slots being used by the job before being applied.
The remaining parameters in the queue configuration template
specify per job soft and hard resource limits as implemented
by the setrlimit(2) system call. See this manual page on
your system for more information. By default, each limit
field is set to infinity (which means RLIM_INFINITY as
described in the setrlimit(2) manual page). The value type
for the CPU-time limits s_cpu and h_cpu is time. The value
type for the other limits is memory. Note: Not all systems
support setrlimit(2).
Note also: s_vmem and h_vmem (virtual memory) are only
available on systems supporting RLIMIT_VMEM (see
setrlimit(2) on your operating system).
The UNICOS operating system supplied by SGI/Cray does not
support the setrlimit(2) system call, using their own
resource limit-setting system call instead. For UNICOS sys-
tems only, the following meanings apply:
s_cpu The per-process CPU time limit in seconds.
s_core The per-process maximum core file size in bytes.
s_data The per-process maximum memory limit in bytes.
s_vmem The same as s_data (if both are set the minimum is
used).
h_cpu The per-job CPU time limit in seconds.
h_data The per-job maximum memory limit in bytes.
h_vmem The same as h_data (if both are set the minimum is
used).
h_fsize The total number of disk blocks that this job can
create.
SEE ALSO
sge_intro(1), csh(1), qconf(1), qmon(1), qrestart(1),
qstat(1), qsub(1), sh(1), nice(2), setrlimit(2),
access_list(5), calendar_conf(5), sge_conf(5), complex(5),
host_conf(5), sched_conf(5), sge_execd(8), sge_qmaster(8),
sge_shepherd(8).
COPYRIGHT
See sge_intro(1) for a full statement of rights and permis-
sions.
Man(1) output converted with
man2html