Linux Kernel CPUFreq framework

Most of the processor have mechanism to save power by scaling the frequency & voltage. Now, the question arise, why these two?

Existing CMOS technology consumes power in mostly three major areas :-
1. Leakage current - It is basically due to the underlying circuitry in the schematics.
                                P(lc) = I,load * Vc = squr(Vc)/RL
2. Recharging Current - It is mainly the parasitic capacitance inducing current.
                                P(rc) = squr(Vc)/Rp = squr(Vc) * Cp * F
3. Shoot through  current - Sometimes when the transistor in the circuit is opening up and the opposite  to it is just started to close, there is some amount of current.
                                P(sc) = squr(Vc) * F / Rs

So, the total power equals to P(lc) + P(rc) + P(sc)

Therefore, the power is directly proportional to the square of Voltage and the constant or the linear function of the Frequency.

So, here we are with the answer to our basic question of why scaling only freq & volt.

How does Linux Kernel provide support for this? Answer lies in the CPU-Freq framework provided by the kernel.
It is the linux subsystem to set cpu frequency. The details of this can be read from "Documentation/cpu-freq/".
Lets see how to use this framework from the User-Space.

The cpufreq framework is mostly divided in three parts -
1. cpufreq module 
2. cpu-specific drivers
3. in-kernel governors

Linux kernel provide three kinds of governor which can be used for any kind of CPU. They are
1. performance governor
2. powersave governor
3. userspace governor

apart from these, there are two new governors namely,
1. Ondemand governor
2. conservative governor

On the lowest level of this architecture, there lies ACPI processor driver. This is used by the CPU specific drivers like powernow-k8 etc. These provide various /sys/ and /proc/ interfaces. above it lies in-kernel governors.  User-level governors uses userspace governor to modify cpu frequencies.

To understand how we use these sys attributes, go to /sys/devices/system/cpu/. This lists the available cpus and cpu-freq governor details.

arya@maya:/$ ls /sys/devices/system/cpu/

cpu0  cpu1  cpufreq  cpuidle  kernel_max  microcode  modalias  offline  online  possible  power  present  probe  release  uevent

To check the governor,

arya@maya:/sys/devices/system/cpu$ ls cpufreq/
boost  ondemand

Ok. Now lets go inside one of the cpus.

root@maya:/sys/devices/system/cpu/cpu0# ll cpufreq/
total 0
drwxr-xr-x 3 root root    0 Apr  1 22:53 ./
drwxr-xr-x 8 root root    0 Apr  2  2013 ../
-r--r--r-- 1 root root 4096 Apr  1 23:02 affected_cpus
-r--r--r-- 1 root root 4096 Apr  1 23:02 bios_limit
-rw-r--r-- 1 root root 4096 Apr  1 23:02 cpb
-r-------- 1 root root 4096 Apr  1 23:02 cpuinfo_cur_freq
-r--r--r-- 1 root root 4096 Apr  1 23:02 cpuinfo_max_freq
-r--r--r-- 1 root root 4096 Apr  1 23:02 cpuinfo_min_freq
-r--r--r-- 1 root root 4096 Apr  1 23:02 cpuinfo_transition_latency
-r--r--r-- 1 root root 4096 Apr  1 23:02 related_cpus
-r--r--r-- 1 root root 4096 Apr  1 22:53 scaling_available_frequencies
-r--r--r-- 1 root root 4096 Apr  1 22:53 scaling_available_governors
-r--r--r-- 1 root root 4096 Apr  1 22:53 scaling_cur_freq
-r--r--r-- 1 root root 4096 Apr  1 23:02 scaling_driver
-rw-r--r-- 1 root root 4096 Apr  1 22:53 scaling_governor
-rw-r--r-- 1 root root 4096 Apr  1 23:02 scaling_max_freq
-rw-r--r-- 1 root root 4096 Apr  1 23:02 scaling_min_freq
-rw-r--r-- 1 root root 4096 Apr  1 22:53 scaling_setspeed
drwxr-xr-x 2 root root    0 Apr  1 23:02 stats/

1. current cpu frequency

root@maya:/sys/devices/system/cpu/cpu0# cat cpufreq/cpuinfo_cur_freq 
2. max supported cpu frequency
root@maya:/sys/devices/system/cpu/cpu0# cat cpufreq/cpuinfo_max_freq
3. min supported cpu frequency
root@maya:/sys/devices/system/cpu/cpu0# cat cpufreq/cpuinfo_min_freq
4. list all available cpu frequency
root@maya:/sys/devices/system/cpu/cpu0# cat cpufreq/scaling_available_frequencies 
1600000 1280000 800000 
5. list all available governors
root@maya:/sys/devices/system/cpu/cpu0# cat cpufreq/scaling_available_governors 
conservative ondemand userspace powersave performance 
6. return cached value of cpu frequency from cpu-freq driver
root@maya:/sys/devices/system/cpu/cpu0# cat cpufreq/scaling_cur_freq 
7. return cpu specific driver
root@maya:/sys/devices/system/cpu/cpu0# cat cpufreq/scaling_driver 
8. user controlled lower cpu frequency
root@maya:/sys/devices/system/cpu/cpu0# cat cpufreq/scaling_min_freq 

9. user controlled higher cpu frequency
root@maya:/sys/devices/system/cpu/cpu0# cat cpufreq/scaling_max_freq 

The interfaces under the stats/ directory provide the statistics about the usage of frequency changes on any particular CPU.

1. details of cpu frequency transitions

root@maya:/sys/devices/system/cpu/cpu0# cat cpufreq/stats/trans_table 
   From :To :   1600000   1280000    800000 
  1600000    :               0         1582        2938 
  1280000    :           384               0        1198 
   800000     :         4135               0              0 
2. details of total time spent in a particular state.
root@maya:/sys/devices/system/cpu/cpu0# cat cpufreq/stats/time_in_state 
1600000    46780
1280000      4558
800000    421491
3. details of total number of transition by cpu.
root@maya:/sys/devices/system/cpu/cpu0# cat cpufreq/stats/total_trans 

This is the basic of all kind of governors today exists. One of the most common governor used in Linux is OnDemand Governor
But, this does not stop one to write their own. Infact, there are as many as 28 governors I am aware of in android. This is explained in one of my previous blogs.

More to come on the governors. leave comment to improve the knowledge base.

happy blogging !

1 comment:

  1. Why to scale up Voltage while scaling up the Freq?
    Because, when the CPU Freq increases, the signal strength may not be sufficient to reach the Logic High level. ie, when the signal is asked to oscillate between logic low and logic high, if the signal's strength is not enough, then there is a chance that, while oscilating so fast, it can reach only upto 80% or 90% of the total high level, thus creating instability in the computing. This is the reason, they increase voltage as well, while clocking high