111 lines
4.4 KiB
Plaintext
111 lines
4.4 KiB
Plaintext
|
BitMap queue CPU Scheduler
|
|||
|
--------------------------
|
|||
|
|
|||
|
CONTENT
|
|||
|
========
|
|||
|
|
|||
|
Background
|
|||
|
Design
|
|||
|
Overview
|
|||
|
Task policy
|
|||
|
Priority management
|
|||
|
BitMap Queue
|
|||
|
CPU Assignment and Migration
|
|||
|
|
|||
|
|
|||
|
Background
|
|||
|
==========
|
|||
|
|
|||
|
BitMap Queue CPU scheduler, referred to as BMQ from here on, is an evolution
|
|||
|
of previous Priority and Deadline based Skiplist multiple queue scheduler(PDS),
|
|||
|
and inspired by Zircon scheduler. The goal of it is to keep the scheduler code
|
|||
|
simple, while efficiency and scalable for interactive tasks, such as desktop,
|
|||
|
movie playback and gaming etc.
|
|||
|
|
|||
|
Design
|
|||
|
======
|
|||
|
|
|||
|
Overview
|
|||
|
--------
|
|||
|
|
|||
|
BMQ use per CPU run queue design, each CPU(logical) has it's own run queue,
|
|||
|
each CPU is responsible for scheduling the tasks that are putting into it's
|
|||
|
run queue.
|
|||
|
|
|||
|
The run queue is a set of priority queues. Note that these queues are fifo
|
|||
|
queue for non-rt tasks or priority queue for rt tasks in data structure. See
|
|||
|
BitMap Queue below for details. BMQ is optimized for non-rt tasks in the fact
|
|||
|
that most applications are non-rt tasks. No matter the queue is fifo or
|
|||
|
priority, In each queue is an ordered list of runnable tasks awaiting execution
|
|||
|
and the data structures are the same. When it is time for a new task to run,
|
|||
|
the scheduler simply looks the lowest numbered queueue that contains a task,
|
|||
|
and runs the first task from the head of that queue. And per CPU idle task is
|
|||
|
also in the run queue, so the scheduler can always find a task to run on from
|
|||
|
its run queue.
|
|||
|
|
|||
|
Each task will assigned the same timeslice(default 4ms) when it is picked to
|
|||
|
start running. Task will be reinserted at the end of the appropriate priority
|
|||
|
queue when it uses its whole timeslice. When the scheduler selects a new task
|
|||
|
from the priority queue it sets the CPU's preemption timer for the remainder of
|
|||
|
the previous timeslice. When that timer fires the scheduler will stop execution
|
|||
|
on that task, select another task and start over again.
|
|||
|
|
|||
|
If a task blocks waiting for a shared resource then it's taken out of its
|
|||
|
priority queue and is placed in a wait queue for the shared resource. When it
|
|||
|
is unblocked it will be reinserted in the appropriate priority queue of an
|
|||
|
eligible CPU.
|
|||
|
|
|||
|
Task policy
|
|||
|
-----------
|
|||
|
|
|||
|
BMQ supports DEADLINE, FIFO, RR, NORMAL, BATCH and IDLE task policy like the
|
|||
|
mainline CFS scheduler. But BMQ is heavy optimized for non-rt task, that's
|
|||
|
NORMAL/BATCH/IDLE policy tasks. Below is the implementation detail of each
|
|||
|
policy.
|
|||
|
|
|||
|
DEADLINE
|
|||
|
It is squashed as priority 0 FIFO task.
|
|||
|
|
|||
|
FIFO/RR
|
|||
|
All RT tasks share one single priority queue in BMQ run queue designed. The
|
|||
|
complexity of insert operation is O(n). BMQ is not designed for system runs
|
|||
|
with major rt policy tasks.
|
|||
|
|
|||
|
NORMAL/BATCH/IDLE
|
|||
|
BATCH and IDLE tasks are treated as the same policy. They compete CPU with
|
|||
|
NORMAL policy tasks, but they just don't boost. To control the priority of
|
|||
|
NORMAL/BATCH/IDLE tasks, simply use nice level.
|
|||
|
|
|||
|
ISO
|
|||
|
ISO policy is not supported in BMQ. Please use nice level -20 NORMAL policy
|
|||
|
task instead.
|
|||
|
|
|||
|
Priority management
|
|||
|
-------------------
|
|||
|
|
|||
|
RT tasks have priority from 0-99. For non-rt tasks, there are three different
|
|||
|
factors used to determine the effective priority of a task. The effective
|
|||
|
priority being what is used to determine which queue it will be in.
|
|||
|
|
|||
|
The first factor is simply the task’s static priority. Which is assigned from
|
|||
|
task's nice level, within [-20, 19] in userland's point of view and [0, 39]
|
|||
|
internally.
|
|||
|
|
|||
|
The second factor is the priority boost. This is a value bounded between
|
|||
|
[-MAX_PRIORITY_ADJ, MAX_PRIORITY_ADJ] used to offset the base priority, it is
|
|||
|
modified by the following cases:
|
|||
|
|
|||
|
*When a thread has used up its entire timeslice, always deboost its boost by
|
|||
|
increasing by one.
|
|||
|
*When a thread gives up cpu control(voluntary or non-voluntary) to reschedule,
|
|||
|
and its switch-in time(time after last switch and run) below the thredhold
|
|||
|
based on its priority boost, will boost its boost by decreasing by one buti is
|
|||
|
capped at 0 (won’t go negative).
|
|||
|
|
|||
|
The intent in this system is to ensure that interactive threads are serviced
|
|||
|
quickly. These are usually the threads that interact directly with the user
|
|||
|
and cause user-perceivable latency. These threads usually do little work and
|
|||
|
spend most of their time blocked awaiting another user event. So they get the
|
|||
|
priority boost from unblocking while background threads that do most of the
|
|||
|
processing receive the priority penalty for using their entire timeslice.
|