111 lines
4.4 KiB
Plaintext
111 lines
4.4 KiB
Plaintext
BitMap queue CPU Scheduler
|
||
--------------------------
|
||
|
||
CONTENT
|
||
========
|
||
|
||
Background
|
||
Design
|
||
Overview
|
||
Task policy
|
||
Priority management
|
||
BitMap Queue
|
||
CPU Assignment and Migration
|
||
|
||
|
||
Background
|
||
==========
|
||
|
||
BitMap Queue CPU scheduler, referred to as BMQ from here on, is an evolution
|
||
of previous Priority and Deadline based Skiplist multiple queue scheduler(PDS),
|
||
and inspired by Zircon scheduler. The goal of it is to keep the scheduler code
|
||
simple, while efficiency and scalable for interactive tasks, such as desktop,
|
||
movie playback and gaming etc.
|
||
|
||
Design
|
||
======
|
||
|
||
Overview
|
||
--------
|
||
|
||
BMQ use per CPU run queue design, each CPU(logical) has it's own run queue,
|
||
each CPU is responsible for scheduling the tasks that are putting into it's
|
||
run queue.
|
||
|
||
The run queue is a set of priority queues. Note that these queues are fifo
|
||
queue for non-rt tasks or priority queue for rt tasks in data structure. See
|
||
BitMap Queue below for details. BMQ is optimized for non-rt tasks in the fact
|
||
that most applications are non-rt tasks. No matter the queue is fifo or
|
||
priority, In each queue is an ordered list of runnable tasks awaiting execution
|
||
and the data structures are the same. When it is time for a new task to run,
|
||
the scheduler simply looks the lowest numbered queueue that contains a task,
|
||
and runs the first task from the head of that queue. And per CPU idle task is
|
||
also in the run queue, so the scheduler can always find a task to run on from
|
||
its run queue.
|
||
|
||
Each task will assigned the same timeslice(default 4ms) when it is picked to
|
||
start running. Task will be reinserted at the end of the appropriate priority
|
||
queue when it uses its whole timeslice. When the scheduler selects a new task
|
||
from the priority queue it sets the CPU's preemption timer for the remainder of
|
||
the previous timeslice. When that timer fires the scheduler will stop execution
|
||
on that task, select another task and start over again.
|
||
|
||
If a task blocks waiting for a shared resource then it's taken out of its
|
||
priority queue and is placed in a wait queue for the shared resource. When it
|
||
is unblocked it will be reinserted in the appropriate priority queue of an
|
||
eligible CPU.
|
||
|
||
Task policy
|
||
-----------
|
||
|
||
BMQ supports DEADLINE, FIFO, RR, NORMAL, BATCH and IDLE task policy like the
|
||
mainline CFS scheduler. But BMQ is heavy optimized for non-rt task, that's
|
||
NORMAL/BATCH/IDLE policy tasks. Below is the implementation detail of each
|
||
policy.
|
||
|
||
DEADLINE
|
||
It is squashed as priority 0 FIFO task.
|
||
|
||
FIFO/RR
|
||
All RT tasks share one single priority queue in BMQ run queue designed. The
|
||
complexity of insert operation is O(n). BMQ is not designed for system runs
|
||
with major rt policy tasks.
|
||
|
||
NORMAL/BATCH/IDLE
|
||
BATCH and IDLE tasks are treated as the same policy. They compete CPU with
|
||
NORMAL policy tasks, but they just don't boost. To control the priority of
|
||
NORMAL/BATCH/IDLE tasks, simply use nice level.
|
||
|
||
ISO
|
||
ISO policy is not supported in BMQ. Please use nice level -20 NORMAL policy
|
||
task instead.
|
||
|
||
Priority management
|
||
-------------------
|
||
|
||
RT tasks have priority from 0-99. For non-rt tasks, there are three different
|
||
factors used to determine the effective priority of a task. The effective
|
||
priority being what is used to determine which queue it will be in.
|
||
|
||
The first factor is simply the task’s static priority. Which is assigned from
|
||
task's nice level, within [-20, 19] in userland's point of view and [0, 39]
|
||
internally.
|
||
|
||
The second factor is the priority boost. This is a value bounded between
|
||
[-MAX_PRIORITY_ADJ, MAX_PRIORITY_ADJ] used to offset the base priority, it is
|
||
modified by the following cases:
|
||
|
||
*When a thread has used up its entire timeslice, always deboost its boost by
|
||
increasing by one.
|
||
*When a thread gives up cpu control(voluntary or non-voluntary) to reschedule,
|
||
and its switch-in time(time after last switch and run) below the thredhold
|
||
based on its priority boost, will boost its boost by decreasing by one buti is
|
||
capped at 0 (won’t go negative).
|
||
|
||
The intent in this system is to ensure that interactive threads are serviced
|
||
quickly. These are usually the threads that interact directly with the user
|
||
and cause user-perceivable latency. These threads usually do little work and
|
||
spend most of their time blocked awaiting another user event. So they get the
|
||
priority boost from unblocking while background threads that do most of the
|
||
processing receive the priority penalty for using their entire timeslice.
|