BitMap queue CPU Scheduler -------------------------- CONTENT ======== Background Design Overview Task policy Priority management BitMap Queue CPU Assignment and Migration Background ========== BitMap Queue CPU scheduler, referred to as BMQ from here on, is an evolution of previous Priority and Deadline based Skiplist multiple queue scheduler(PDS), and inspired by Zircon scheduler. The goal of it is to keep the scheduler code simple, while efficiency and scalable for interactive tasks, such as desktop, movie playback and gaming etc. Design ====== Overview -------- BMQ use per CPU run queue design, each CPU(logical) has it's own run queue, each CPU is responsible for scheduling the tasks that are putting into it's run queue. The run queue is a set of priority queues. Note that these queues are fifo queue for non-rt tasks or priority queue for rt tasks in data structure. See BitMap Queue below for details. BMQ is optimized for non-rt tasks in the fact that most applications are non-rt tasks. No matter the queue is fifo or priority, In each queue is an ordered list of runnable tasks awaiting execution and the data structures are the same. When it is time for a new task to run, the scheduler simply looks the lowest numbered queueue that contains a task, and runs the first task from the head of that queue. And per CPU idle task is also in the run queue, so the scheduler can always find a task to run on from its run queue. Each task will assigned the same timeslice(default 4ms) when it is picked to start running. Task will be reinserted at the end of the appropriate priority queue when it uses its whole timeslice. When the scheduler selects a new task from the priority queue it sets the CPU's preemption timer for the remainder of the previous timeslice. When that timer fires the scheduler will stop execution on that task, select another task and start over again. If a task blocks waiting for a shared resource then it's taken out of its priority queue and is placed in a wait queue for the shared resource. When it is unblocked it will be reinserted in the appropriate priority queue of an eligible CPU. Task policy ----------- BMQ supports DEADLINE, FIFO, RR, NORMAL, BATCH and IDLE task policy like the mainline CFS scheduler. But BMQ is heavy optimized for non-rt task, that's NORMAL/BATCH/IDLE policy tasks. Below is the implementation detail of each policy. DEADLINE It is squashed as priority 0 FIFO task. FIFO/RR All RT tasks share one single priority queue in BMQ run queue designed. The complexity of insert operation is O(n). BMQ is not designed for system runs with major rt policy tasks. NORMAL/BATCH/IDLE BATCH and IDLE tasks are treated as the same policy. They compete CPU with NORMAL policy tasks, but they just don't boost. To control the priority of NORMAL/BATCH/IDLE tasks, simply use nice level. ISO ISO policy is not supported in BMQ. Please use nice level -20 NORMAL policy task instead. Priority management ------------------- RT tasks have priority from 0-99. For non-rt tasks, there are three different factors used to determine the effective priority of a task. The effective priority being what is used to determine which queue it will be in. The first factor is simply the task’s static priority. Which is assigned from task's nice level, within [-20, 19] in userland's point of view and [0, 39] internally. The second factor is the priority boost. This is a value bounded between [-MAX_PRIORITY_ADJ, MAX_PRIORITY_ADJ] used to offset the base priority, it is modified by the following cases: *When a thread has used up its entire timeslice, always deboost its boost by increasing by one. *When a thread gives up cpu control(voluntary or non-voluntary) to reschedule, and its switch-in time(time after last switch and run) below the thredhold based on its priority boost, will boost its boost by decreasing by one buti is capped at 0 (won’t go negative). The intent in this system is to ensure that interactive threads are serviced quickly. These are usually the threads that interact directly with the user and cause user-perceivable latency. These threads usually do little work and spend most of their time blocked awaiting another user event. So they get the priority boost from unblocking while background threads that do most of the processing receive the priority penalty for using their entire timeslice.