LINUX Jannat: March 2013

__init call in linux kernel..How are they called?

I always had this doubt about how the Linux drivers and internal core gets loaded in kernel. How do we decide which drive to load first as there are many dependencies for each of them. So, I raised this question and here is the answer StackOverflow.

This really helped me understanding the booting of Linux Kernel so I thought of sharing with ya all.

Signing off for a better Sunday :)

Android CPU Governors explained

(Credit - xda forum)

It always fascinated me how the cpufreq-ondemand driver works in Linux. While browsing, I got a very interesting discussion going on regarding Android CPU governors. Here is the content of that discussion.

Different CPU governors for Power Management used are -

1: OnDemand Governor:
This governor has a hair trigger for boosting clockspeed to the maximum speed set by the user. If the CPU load placed by the user abates, the OnDemand governor will slowly step back down through the kernel's frequency steppings until it settles at the lowest possible frequency, or the user executes another task to demand a ramp.

OnDemand has excellent interface fluidity because of its high-frequency bias, but it can also have a relatively negative effect on battery life versus other governors. OnDemand is commonly chosen by smartphone manufacturers because it is well-tested, reliable, and virtually guarantees the smoothest possible performance for the phone. This is so because users are vastly more likely to bitch about performance than they are the few hours of extra battery life another governor could have granted them.

This final fact is important to know before you read about the Interactive governor: OnDemand scales its clockspeed in a work queue context. In other words, once the task that triggered the clockspeed ramp is finished, OnDemand will attempt to move the clockspeed back to minimum. If the user executes another task that triggers OnDemand's ramp, the clockspeed will bounce from minimum to maximum. This can happen especially frequently if the user is multi-tasking. This, too, has negative implications for battery life.

2: OndemandX:
Basically an ondemand with suspend/wake profiles. This governor is supposed to be a battery friendly ondemand. When screen is off, max frequency is capped at 500 mhz. Even though ondemand is the default governor in many kernel and is considered safe/stable, the support for ondemand/ondemandX depends on CPU capability to do fast frequency switching which are very low latency frequency transitions. I have read somewhere that the performance of ondemand/ondemandx were significantly varying for different i/o schedulers. This is not true for most of the other governors. I personally feel ondemand/ondemandx goes best with SIO I/O scheduler.

3: Performance Governor:
This locks the phone's CPU at maximum frequency. While this may sound like an ugly idea, there is growing evidence to suggest that running a phone at its maximum frequency at all times will allow a faster race-to-idle. Race-to-idle is the process by which a phone completes a given task, such as syncing email, and returns the CPU to the extremely efficient low-power state. This still requires extensive testing, and a kernel that properly implements a given CPU's C-states (low power states).

4: Powersave Governor:
The opposite of the Performance governor, the Powersave governor locks the CPU frequency at the lowest frequency set by the user.

5:Conservative Governor:
This biases the phone to prefer the lowest possible clockspeed as often as possible. In other words, a larger and more persistent load must be placed on the CPU before the conservative governor will be prompted to raise the CPU clockspeed. Depending on how the developer has implemented this governor, and the minimum clockspeed chosen by the user, the conservative governor can introduce choppy performance. On the other hand, it can be good for battery life.

The Conservative Governor is also frequently described as a "slow OnDemand," if that helps to give you a more complete picture of its functionality.

6: Userspace Governor:
This governor, exceptionally rare for the world of mobile devices, allows any program executed by the user to set the CPU's operating frequency. This governor is more common amongst servers or desktop PCs where an application (like a power profile app) needs privileges to set the CPU clockspeed.

7: Min Max
well this governor makes use of only min & maximum frequency based on workload... no intermediate frequencies are used.

8: Interactive Governor:
Much like the OnDemand governor, the Interactive governor dynamically scales CPU clockspeed in response to the workload placed on the CPU by the user. This is where the similarities end. Interactive is significantly more responsive than OnDemand, because it's faster at scaling to maximum frequency.

Unlike OnDemand, which you'll recall scales clockspeed in the context of a work queue, Interactive scales the clockspeed over the course of a timer set arbitrarily by the kernel developer. In other words, if an application demands a ramp to maximum clockspeed (by placing 100% load on the CPU), a user can execute another task before the governor starts reducing CPU frequency. This can eliminate the frequency bouncing discussed in the OnDemand section. Because of this timer, Interactive is also better prepared to utilize intermediate clockspeeds that fall between the minimum and maximum CPU frequencies. This is another pro-battery life benefit of Interactive.

However, because Interactive is permitted to spend more time at maximum frequency than OnDemand (for device performance reasons), the battery-saving benefits discussed above are effectively negated. Long story short, Interactive offers better performance than OnDemand (some say the best performance of any governor) and negligibly different battery life.

Interactive also makes the assumption that a user turning the screen on will shortly be followed by the user interacting with some application on their device. Because of this, screen on triggers a ramp to maximum clockspeed, followed by the timer behavior described above.

9: InteractiveX Governor:
Created by kernel developer "Imoseyon," the InteractiveX governor is based heavily on the Interactive governor, enhanced with tuned timer parameters to better balance battery vs. performance. The InteractiveX governor's defining feature, however, is that it locks the CPU frequency to the user's lowest defined speed when the screen is off.

10: Smartass
Is based on the concept of the interactive governor.
I have always agreed that in theory the way interactive works – by taking over the idle loop – is very attractive. I have never managed to tweak it so it would behave decently in real life. Smartass is a complete rewrite of the code plus more. I think its a success. Performance is on par with the “old” minmax and I think smartass is a bit more responsive. Battery life is hard to quantify precisely but it does spend much more time at the lower frequencies.
Smartass will also cap the max frequency when sleeping to 352Mhz (or if your min frequency is higher than 352 – why?! – it will cap it to your min frequency). Lets take for example the 528/176 kernel, it will sleep at 352/176. No need for sleep profiles any more!"

11: SmartassV2:
Version 2 of the original smartass governor from Erasmux. Another favorite for many a people. The governor aim for an "ideal frequency", and ramp up more aggressively towards this freq and less aggressive after. It uses different ideal frequencies for screen on and screen off, namely awake_ideal_freq and sleep_ideal_freq. This governor scales down CPU very fast (to hit sleep_ideal_freq soon) while screen is off and scales up rapidly to awake_ideal_freq (500 mhz for GS2 by default) when screen is on. There's no upper limit for frequency while screen is off (unlike Smartass). So the entire frequency range is available for the governor to use during screen-on and screen-off state. The motto of this governor is a balance between performance and battery.

12: Scary
A new governor wrote based on conservative with some smartass features, it scales accordingly to conservatives laws. So it will start from the bottom, take a load sample, if it's above the upthreshold, ramp up only one speed at a time, and ramp down one at a time. It will automatically cap the off screen speeds to 245Mhz, and if your min freq is higher than 245mhz, it will reset the min to 120mhz while screen is off and restore it upon screen awakening, and still scale accordingly to conservatives laws. So it spends most of its time at lower frequencies. The goal of this is to get the best battery life with decent performance. It will give the same performance as conservative right now, it will get tweaked over time.

13: Lagfree:
Lagfree is similar to ondemand. Main difference is it's optimization to become more battery friendly. Frequency is gracefully decreased and increased, unlike ondemand which jumps to 100% too often. Lagfree does not skip any frequency step while scaling up or down. Remember that if there's a requirement for sudden burst of power, lagfree can not satisfy that since it has to raise cpu through each higher frequency step from current. Some users report that video playback using lagfree stutters a little.

14: Smoothass:
The same as the Smartass “governor” But MUCH more aggressive & across the board this one has a better battery life that is about a third better than stock KERNEL

15: Brazilianwax:
Similar to smartassV2. More aggressive ramping, so more performance, less battery

16: SavagedZen:
Another smartassV2 based governor. Achieves good balance between performance & battery as compared to brazilianwax.

17: Lazy:
This governor from Ezekeel is basically an ondemand with an additional parameter min_time_state to specify the minimum time CPU stays on a frequency before scaling up/down. The Idea here is to eliminate any instabilities caused by fast frequency switching by ondemand. Lazy governor polls more often than ondemand, but changes frequency only after completing min_time_state on a step overriding sampling interval. Lazy also has a screenoff_maxfreq parameter which when enabled will cause the governor to always select the maximum frequency while the screen is off.

18: Lionheart:
Lionheart is a conservative-based governor which is based on samsung's update3 source.
The tunables (such as the thresholds and sampling rate) were changed so the governor behaves more like the performance one, at the cost of battery as the scaling is very aggressive.

19: LionheartX
LionheartX is based on Lionheart but has a few changes on the tunables and features a suspend profile based on Smartass governor.

20: Intellidemand:
Intellidemand aka Intelligent Ondemand from Faux is yet another governor that's based on ondemand. Unlike what some users believe, this governor is not the replacement for OC Daemon (Having different governors for sleep and awake). The original intellidemand behaves differently according to GPU usage. When GPU is really busy (gaming, maps, benchmarking, etc) intellidemand behaves like ondemand. When GPU is 'idling' (or moderately busy), intellidemand limits max frequency to a step depending on frequencies available in your device/kernel for saving battery. This is called browsing mode. We can see some 'traces' of interactive governor here. Frequency scale-up decision is made based on idling time of CPU. Lower idling time (<20%) causes CPU to scale-up from current frequency. Frequency scale-down happens at steps=5% of max frequency. (This parameter is tunable only in conservative, among the popular governors)
To sum up, this is an intelligent ondemand that enters browsing mode to limit max frequency when GPU is idling, and (exits browsing mode) behaves like ondemand when GPU is busy; to deliver performance for gaming and such. Intellidemand does not jump to highest frequency when screen is off.

21: Hotplug Governor:
The Hotplug governor performs very similarly to the OnDemand governor, with the added benefit of being more precise about how it steps down through the kernel's frequency table as the governor measures the user's CPU load. However, the Hotplug governor's defining feature is its ability to turn unused CPU cores off during periods of low CPU utilization. This is known as "hotplugging."

22: BadAss Governor:
Badass removes all of this "fast peaking" to the max frequency. On a typical system the cpu won't go above 918Mhz and therefore stay cool and will use less power. To trigger a frequency increase, the system must run a bit @ 918Mhz with high load, then the frequency is bumped to 1188Mhz. If that is still not enough the governor gives you full throttle. (this transition should not take longer than 1-2 seconds, depending on the load your system is experiencing)
Badass will also take the gpu load into consideration. If the gpu is moderately busy it will bypass the above check and clock the cpu with 1188Mhz. If the gpu is crushed under load, badass will lift the restrictions to the cpu.

23: Wheatley:
Building on the classic 'ondemand' governor is implemented Wheatley governor. The governor has two additional parameters:

target_residency - The minimum average residency in µs which is considered acceptable for a proper efficient usage of the C4 state. Default is 10000 = 10ms.

allowed_misses - The number sampling intervals in a row the average residency is allowed to be lower than target_residency before the governor reduces the frequency. This ensures that the governor is not too aggressive in scaling down the frequency and reduces it just because some background process was temporarily causing a larger number of wakeups. The default is 5.
Wheatley works as planned and does not hinder the proper C4 usage for task where the C4 can be used properly .
For internet browsing the time spend in C4 has increased by 10% points and the average residency has increased by about 1ms. I guess these differences are mostly due to the different browsing behaviour (I spend the last time more multi-tabbing). But at least we can say that Wheatley does not interfere with the proper use of the C4 state during 'light' tasks. For music playback with screen off the time spend in C4 is practically unchanged, however the average residency is reduced from around 30ms to around 18ms, but this is still more than acceptable.

So the results show that Wheatley works as intended and ensures that the C4 state is used whenever the task allows a proper efficient usage of the C4 state. For more demanding tasks which cause a large number of wakeups and prevent the efficient usage of the C4 state, the governor resorts to the next best power saving mechanism and scales down the frequency. So with the new highly-flexible Wheatley governor one can have the best of both worlds.

Obviously, this governor is only available on multi-core devices.

24: Lulzactive:
Lulzactive:
This new find from Tegrak is based on Interactive & Smartass governors and is one of the favorites.
Old Version: When workload is greater than or equal to 60%, the governor scales up CPU to next higher step. When workload is less than 60%, governor scales down CPU to next lower step. When screen is off, frequency is locked to global scaling minimum frequency.
New Version: Three more user configurable parameters: inc_cpu_load, pump_up_step, pump_down_step. Unlike older version, this one gives more control for the user. We can set the threshold at which governor decides to scale up/down. We can also set number of frequency steps to be skipped while polling up and down.
When workload greater than or equal to inc_cpu_load, governor scales CPU pump_up_step steps up. When workload is less than inc_cpu_load, governor scales CPU down pump_down_step steps down.
Example:
Consider
inc_cpu_load=70
pump_up_step=2
pump_down_step=1
If current frequency=200, Every up_sampling_time Us if cpu load >= 70%, cpu is scaled up 2 steps - to 800.
If current frequency =1200, Every down_sampling_time Us if cpu load < 70%, cpu is scaled down 1 step - to 1000.

25: Pegasusq/Pegasusd

The Pegasus-q / d is a multi-core based on the Ondemand governor and governor with integrated hot-plugging.
Ongoing processes in the queue, we know that multiple processes can run simultaneously on. These processes are active in an array, which is a field called "Run Queue" queue that is ongoing, with their priority values arranged (priority will be used by the task scheduler, which then decides which process to run next).

To ensure that each process has its fair share of resources, each running for a certain period and will eventually stop and then again placed in the queue until it is your turn again. If a program is terminated, so that others can run the program with the highest priority in the current queue is executed.

26: hotplugx

It 'a Hotplug modified and optimized for the suspension in off-screen

27: AbissPlug

It 'a Governor derived hotplug, it works the same way, but with the changes in savings for a better battery.

28: MSM DCVS

a very efficient and wide range of Dynamic Clock and
Voltage Scaling (DCVS) which addresses usage models from
active standby to mid and high level processing requirements.
A Krait CPU can smoothly scale from low power, low
leakage mode to blazingly fast performance.

Believe it's a governor that is mfg'd by qualcomm to utilize new on chip features.

MSM is the prefix for the SOC (MSM8960) and DCVS is Dynamic Clock and Voltage Scaling. Makes sense, MSM-DCVS

Linux support for ARM big.LITTLE

By Nicolas Pitre

ARM Ltd recently announced the big.LITTLE architecture consisting of a twist on the SMP systems that we've all gotten accustomed to. Instead of having a bunch of identical CPU cores put together in a system, the big.LITTLE architecture is effectively pushing the concept further by pulling two different SMP systems together: one being a set of "big" and fast processors, the other one consisting of "little" and power-efficient processors.

In practice this means having a cluster of Cortex-A15 cores, a cluster of Cortex-A7 cores, and ensuring cache coherency between them. The advantage of such an arrangement is that it allows for significant power saving when processes that don't require the full performance of the Cortex-A15 are executed on the Cortex-A7 instead. This way, non-interactive background operation, or streaming multimedia decoding, can be run on the A7 cluster for power efficiency, while sudden screen refreshes and similar bursty operations can be run on the A15 cluster to improve responsiveness and interactivity.

Then, how to support this in Linux? This is not as trivial as it may seem initially. Let's suppose we have a system comprising a cluster of four A15 cores and a cluster of four A7 cores. The naive approach would suggest making the eight cores visible to the kernel and letting the scheduler do its job just like with any other SMP system. But here's the catch: SMP means Symmetric Multi-Processing, and in the big.LITTLE case the cores aren't symmetric between clusters.

The Linux scheduler expects all available CPUs to have the same performance characteristics. For example, there are provisions in the scheduler to deal with things like hyperthreading, but this is still an attribute which is normally available on all CPUs in a given system. Here we're purposely putting together a couple of CPUs with significant performance/power characteristic discrepancies in the same system, and we expect the kernel to make the optimal usage of them at all times, considering that we want to get the best user experience together with the lowest possible battery consumption.

So, what should be done? Many questions come to mind:

Is it OK to reserve the A15 cluster just for interactive tasks and the A7 cluster for background tasks?
What if the interactive tasks are sufficiently light to be processed by the small cores at all times?
What about those background tasks that the user interface is actually waiting after?
How to determine if a task using 100% CPU on a small core should be migrated to a fast core instead, or left on the small core because it is not critical enough to justify the increased power usage?
Should the scheduler auto-tune its behavior, or should user-space policies influence it?
If the latter, what would the interface look like to be useful and sufficiently future-proof?

Linaro started an initiative during the most recent Linaro Connect to investigate this problem. It will require a high degree of collaboration with the upstream scheduler maintainers and a good amount of discussion. And given past history, we know that scheduler changes cannot happen overnight... unless your name is Ingo that is. Therefore, it is safe to assume that this will take a significant amount of time.

Silicon vendors and portable device makers are not going to wait though. Chips implementing the big.LITTLE architecture will appear on the market in one form or another, way before a full heterogeneous multi-processor aware scheduler is available. An interim solution is therefore needed soon. So let's put aside the scheduler for the time being.

ARM Ltd has produced a prototype software solution consisting of a small hypervisor using the virtualization extensions of the Cortex-A15 and Cortex-A7 to make both clusters appear to the underlying operating system as if there was only one Cortex-A15 cluster. Because the cores within a given cluster are still symmetric, all the assumptions built into the current scheduler still hold. With a single call, the hypervisor can atomically suspend execution of the whole system, migrate the CPU states from one cluster to the other, and resume system execution on the other cluster without the underlying operating system being aware of the change; just as if nothing has happened.

Taking the example above, Linux would see only four Cortex-A15 CPUs at all times. When a switch is initiated, the registers for each of the 4 CPUs in cluster A are transferred to corresponding CPUs in cluster B, interrupts are rerouted to the CPUs in cluster B, then CPUs in cluster B are resumed exactly where cluster A was interrupted, and, finally, the CPUs in cluster A are powered off. And vice versa for switching back to the original cluster. Therefore, if there are eight CPU cores in the system, only four of them are visible to the operating system at all times. The only visible difference is the observable execution speed, and of course the corresponding change in power consumption when a cluster switch occurs. Some latency is implied by the actual switch of course, but that should be very small and imperceptible by the user.

This solution has advantages such as providing a mechanism which should work for any operating system targeting a Cortex-A15 without modifications to that operating system. It is therefore OS-independent and easy to integrate. However, it brings a certain level of complexity such as the need to virtualize all the differences between the A15 and the A7. While those CPU cores are functionally equivalent, they may differ in implementation details such as cache topology. That would force every cache maintenance operation to be trapped by the hypervisor and translated into equivalent operations on the actual CPU core when the running core is not the one that the operating system thinks is running.

Another disadvantage is the overhead of saving and restoring the full CPU state because, by virtue of being OS-independent, the hypervisor code may not know what part of the CPU is actually being actively used by the OS. The hypervisor could trap everything to be able to know what is being touched allowing partial context transfers, but that would be yet more complexity for a dubious gain. After all, the kernel already knows what is being used in the CPU, and it can deal with differing cache topologies natively, etc. So why not implement this switcher support directly in the kernel given that we can modify Linux and do better?

In fact that's exactly what we are doing i.e. take the ARM Ltd BSD licensed switcher code and use it as a reference to actually put the switcher functionality directly in the kernel. This way, we can get away with much less support from the hypervisor code and improve switching performances by not having to trap any cache maintenance instructions, by limiting the CPU context transfer only to the minimum set of active registers, and by sharing the same address space with the kernel.

We can implement this switcher by modeling its functionality as a CPU speed change, and therefore expose it via a cpufreq driver. This way, contrary to the reference code from ARM Ltd which is limited to a whole cluster switch, we can easily pair each of the A15 cores with one of the A7 cores, and have each of those CPU pairs appear as a single pseudo CPU with the ability to change its performance level via cpufreq. And because the cpufreq governors are already available and understood by existing distributions, including Android, we therefore have a straightforward solution with a fast time-to-market for the big.LITTLE architecture that shouldn't cause any controversy.

Obviously the "switcher" as we call it is not replacing the ultimate goal of exposing all the cores to the kernel and letting the scheduler make the right decisions. But it is nevertheless a nice self-contained interim solution that will allow pretty good usage of the big.LITTLE architecture while removing the pressure to come up with scheduler changes quickly.

(for original article & discussion :- http://lwn.net/Articles/481055/)

Project cpuboard upgrade

Hi guys

Just created my first GIT repo on github. Soon it will have Liunx-3.8.2 to start with a new upgrade project for ARM9 based system. Details of project will be soon coming.
Mean while here is my GIT link

https://github.com/shekharsingh/cpuboard.git

Since it is my first git repo, the scope of improvement & suggestion is vast. Looking forward for more input from you all.

signing off till next post...

GN