Disable CPU Turbo Boost temporarily (ideal for not heating the CPU while compiling linux kernel for an hour)

turbo

Share Button

I just want to share a little tip about disabling the turbo, or throtling your CPU’s to avoid overheating. Ideal while compiling the linux kernel or playing a game.

CPU Turbo is the automatic overclock that all modern Intel CPUs do. My desktop CPU is an i7 4770K that can go from 3.5GHz to 3.9GHz (on turbo). My laptop is an i5 that can go from 1.7GHz to 2.6GHz. The exact model is i5 3317U (it is an Ultra low voltage ivy bridge model).

As the CPU increases speed, it increases power consumption much faster than linear.

The CPU normally runs at a much lower idle clock, such as 0.8GHz (800MHz), and the scaling governor puts each core at max clock speed when its utilization goes high. When compiling the kernel (which takes about an hour with my laptop’s i5), all four logical cores get put in the maximum frequency which is 1.701GHz. The scaling governor doesn’t actually decides when to turbo, the CPU does it by itself when it runs at max frequency.

But what if I want to run at max nominal clock, and not jump to turbo to save power and decrease temperature?

There is a little trick to disable turbo. My CPU lists the following available maxclocks:

# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies 
1701000 1700000 1600000 1500000 1400000 1300000 1200000 1100000 1000000 900000 800000 799000

Funny, it lists both 1.701GHz and 1.7GHz in my laptop. And turbo clocks are not listed (that is because turbo is out of the control of the OS, and is an automatic hardware feature).

The CPU will apply turbo if the OS told it to run at 1.701GHz, but won’t turbo if the OS tells it to run at less than that, like 1.6GHz.

It’s the same for my desktop i7:

$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies 
3501000 3500000 3300000 3100000 2900000 2700000 2500000 2300000 2100000 2000000 1800000 1600000 1400000 1200000 1000000 800000

The CPU will turbo when told to run at 3.501GHz but not when told to run at 3.3GHz.

Disabling Turbo Boost (until next reboot or undo)

First, get the third max frequency that your CPU supports by looking at ‘scaling_available_frequencies’.

For my laptop, that it is 1600000

Then, ‘echo’ this number into the scaling_max_freq file for each CPU, by doing this in a console:

$ sudo su
# for a in /sys/devices/system/cpu/cpu?; do echo 1600000 > $a/cpufreq/scaling_max_freq ; done

Done, the max CPU clock is just a little less than nominal clock, and turbo will not kick in.

To undo and reenable turbo: ‘echo’ the very max frequency that your CPU supports (1701000) in my case:

# for a in /sys/devices/system/cpu/cpu?; do echo 1701000 > $a/cpufreq/scaling_max_freq ; done

If you want to actually go lower, you can. For instance, you can put the max to 1.4GHz to save power but still have some speed:

# for a in /sys/devices/system/cpu/cpu?; do echo 1400000 > $a/cpufreq/scaling_max_freq ; done

Don’t worry about making mistakes. You can’t. It won’t accept ‘echo’ of speeds that are outside of ‘scaling_available_frequencies’.

How to tell if it worked

The ‘turbostat’ tool comes in linux-tools package. It is the only tool I found that actually tells you when the Turbo is being applied, and it also shows you more data, such as the temperature of each CPU core.

Compiling the linux kernel with turbo enabled on my laptop:

#turbostat
cor CPU    %c0  GHz  TSC SMI    %c1    %c3    %c6    %c7 CTMP PTMP   %pc2   %pc3   %pc6   %pc7  Pkg_W  Cor_W GFX_W
         97.55 2.39 1.70   0   2.45   0.00   0.00   0.00   82   82   0.00   0.00   0.00   0.00  12.13   9.88  0.00
  0   0  97.44 2.39 1.70   0   2.56   0.00   0.00   0.00   81   82   0.00   0.00   0.00   0.00  12.13   9.88  0.00
  0   1  97.24 2.39 1.70   0   2.76
  1   2  98.60 2.39 1.70   0   1.40   0.00   0.00   0.00   82
  1   3  96.94 2.39 1.70   0   3.06

You can see that all four cores are running at 2.39GHz (that is the max speed for four hyperthreading cores being active on turbo, if it was only one core, the turbo would be 2.6GHz).

You can also see that the CTMP (core temperature) is above 80 degrees celcius. PTMP is the package temperature, and Pkg_W is how many watts the whole CPU package is consuming. In this case, 12.1 watts.

Now, lets see if the tip to disable turbo actually works:

# for a in /sys/devices/system/cpu/cpu?; do echo 1600000 > $a/cpufreq/scaling_max_freq ; done 

# turbostat

cor CPU    %c0  GHz  TSC SMI    %c1    %c3    %c6    %c7 CTMP PTMP   %pc2   %pc3   %pc6   %pc7  Pkg_W  Cor_W GFX_W
         98.14 1.60 1.70   0   1.86   0.00   0.00   0.00   72   71   0.00   0.00   0.00   0.00   7.21   5.17  0.01
  0   0  99.05 1.60 1.70   0   0.95   0.00   0.00   0.00   72   71   0.00   0.00   0.00   0.00   7.21   5.17  0.01
  0   1  98.19 1.60 1.70   0   1.81
  1   2  98.07 1.60 1.70   0   1.93   0.00   0.00   0.00   68
  1   3  97.25 1.60 1.70   0   2.75

Temperature dropped 10 degrees, and power consumption dropped from 12 watts to 7.2 watts.

That’s more like it. The kernel will take a little bit longer to compile, but you won’t burn your lap while it does.

And some games have plenty of CPU already to run, it’s just that turbo kicks in when the CPUs are active, even when turbo is not necessary to run the game at max speed.

I hope all of you enjoyed these tips.

Cheers!

— Juan Manuel Cabo

 

Share Button

Bug Hunting with Linux (The story of the Samsung Ultrabooks lid close / AC status bug)

If you have a Samsung laptop series 5 or series 9 from 2011 and 2012, you’ll find the following interesting.

ultrabook

Share Button

If you have one of the following symptoms:

  • Lost the ability to get the laptop to sleep by closing the lid (both in windows and linux).
  • Linux doesn’t realize that you plugged or unplugged the power supply.
  • Windows takes many seconds to realize that you plugged or unplugged the power supply, instead of having immediate feedback.
  • Keyboard backlight doesn’t turn on at night (ambient light sensor events issue).

then it means that you have a “stuck” embedded controller.

For the anxious, I’ll let you know two more important facts, before getting into the story, the embedded controller talk, and the workaround fix:

  1. These issues are caused by an “unlucky” suspend/resume. If you never suspend your laptop, you’ll never get a stuck embedded controller.
  2. You can fix the symptoms temporarily by pushing the reset button through the small hole in the back of the laptop (the other side of the touchpad). The computer must be turned off and unplugged. After hitting the button, you’ll need to plug it to the wall to start for the first time. This is temporary and the issues will come back with a bad sleep / resume.

There is a permanent fix, which will not require you to hit the reset button. More on that later.

My end of the story:

I bought my Samsung ultrabook Series 5 in november 2013. I started noticing these issues during the first week. On that saturday I decided I had enough and spent the day trying to find a way to recover the lost functionality, and be able to go to suspend by closing the lid again. I figured that since the screen went off when closing the lid, it meant that the lid sensor was working fine, and that it must have been a BIOS or motherboard problem, since neither linux nor windows were capable of detecting the lid, wheras they both previously did.
I toggled all BIOS screen options and all Samsung Easy Settings options. None of this brought functionality back.
So I set out to reflash the BIOS, hoping that some hidden internal option causing this would get reset to default. But I quickly found out that Samsung bios update tool doesn’t let you flash the BIOS if it’s already the latest version. Long story short, that same saturday of november I found a way to do that and shared it here:
http://forum.notebookreview.com/samsung/683727-where-lid-closure-sensor-series-5-ultra-np530u3c.html#post9455595
After flashing the same BIOS version over, the lid detection and power supply unit (PSU) detection started working fine again.

A few weeks later the syndrome came back. I reflashed the BIOS again and it got fixed once again. This time I realized that it came back right after resuming the laptop from sleep, so I refrained from suspending it in the future. Using the laptop bacame no-fun. It lost its cool.

Two months later, now February 2014, I became careless and started suspending the laptop again.

The issue came back.

Closing the lid was ignored, and the charging icon persisted even after unplugging the laptop from the wall.

I searched online once again for this issue, and this time I found a bug report in ubuntu’s launchpad website. In that thread I found a better way to fix the issue that didn’t involve reflashing a bios, but just involved hitting the reset button in the back, which I never noticed before:

     https://bugs.launchpad.net/ubuntu/+source/linux/+bug/971061

Reading that forum, it quickly became obvious that they talked about the same problem, even though they mentioned different models. It confirmed to me that the issue was suspend related, and some posters suggested that the issue came back for sure when leaving the laptop suspended for a long time, though there was no precision as to why some suspensions didn’t trigger the bug and others did.

At first I just shared my own temporary fix (reflashing the bios), since I read that some folks thought that samsung fixed the issue with a new bios… though it is a temporary fix, not due to the new BIOS, but due to the act of flashing itself (which is actually because the embedded controller gets reset after a bios flash). And while sharing my own fix in post #97, I theorized that the “problem state” might be triggered by a poor implementation of Intel Rapid Start. Later I found I was wrong with Intel Rapid Start. It made some sense at the time to me, given that the posters said that a long time suspended triggered the issue and that Intel Rapid Start also gets triggered during suspend (by time or by battery low). I began to leave the laptop suspended overnight

Vacation time came from work. After a few days of recovering from lost hours of sleep, and having gone around the house fixing things (fawcets, etc.), getting up to date with housekeeping,  I gave a look at my ultrabook and thought to myself: “you’re next”. hahahaha

First I tried to put my theory of Intel Rapid Start interfering from sleep to the test. I had rapid start disabled since the day I bought it. So I thought that that could be the cause, I thought that the BIOS expected rapid start to be enabled and that when it didn’t find support for it the bug was triggered (but I was wrong). I enabled intel rapid start (this required me to resize the SSD partition to make room for a special hibernation partition). To enable intel rapid start, I had to do it from Easy Settings in windows, (otherwise, the INT3392 acpi device wouldn’t show up).  Having enabled it, I could now use the intel_irst module in linux, which had a /sys interface (wakeup_time and wakeup_events) to enable/disable rapid start triggers by time or battery, and to configure the time.
I started a text file keep a log of my tests.
I started leaving the laptop suspended for whole nights, trying different values for wakeup_time and wakeup_events.
I realized that they made absolutely no difference. Intel rapid start was ruled out as a cause.

I made further tests (see post #100 of the launchpad bug thread). Too many to mention here.

At the same time I studied my laptop’s DSDT. (Attached here). In case you don’t know: DSDT code is an almost platform independant code (it’s not assembler or machine code) provided by the BIOS to be run by the OS. This code is interpreted by the OS. It’s like “a driver provided by the BIOS” to be run and interpreted by linux or windows, to serve as an interface or glue. It’s different for each laptop, and when you update your BIOS, you are updating the DSDT too. The DSDT code is read once by the OS, on each boot, from the BIOS.

The DSDT showed that event handlers for LID and Battery status change were of course inplace, (_Q51, _Q52 (AC), _Q53, _Q54 _Q66 (Battery), _Q5E, _Q5F(LID)).

Searching for everything I needed to know regarding the production of those events, I found these two useful links, which gave me a break:

      http://kernel.ubuntu.com/~cking/presentations/gpes-and-embedded-controller/EmbeddedControllerAndACPI.odp

      https://wiki.edubuntu.org/Kernel/Reference/ACPITricksAndTips

It goes like this: when battery changes one percent (up or down), or when you close the lid on your laptop, the Embedded Controller notices this, and produces an interrupt through the general purpose event 0x17 (other laptops might have different GPE for the EC). Code in the OS handles the interruption, and asks the EC for the exact event produced:

  • 0x51 and 0x52 for AC plugged or unplugged.
  • 0x53 and 0x54, and 0x66 for battery changed
  • 0x5E and 0x5F for LID closed or open

If the event has a status with 0x20 bit mask set, the OS runs the corresponding _Q## method of the DSDT to handle the event. In the case of the LID, the _Q5E() method could be called, and if the lid status was different than before, an ACPI event gets generated through a call to Notify(LID0, 0x80). Pretty much the same for AC (ADP1 device) and Battery (BAT1 device) status.

Now, because of my tests, (see post #100) I realized that there was a rythm to all this. Events were missing in the problem state, and there was an almost exact number of events which I pinpointed as 8 plugs/unplugs from AC when suspended (or some battery drop or increase during suspension), that put the laptop into the problem state again. The Embedded Controller was at the center of all this. I also found the quickest way to reproduce the issue and get into the “problem state”, to test potential fixes:

  1. Sleep the computer in linux (by closing the lid or any other means).
  2. Unplug from the wall, plug, unplug, plug, unplug, plug, unplug, plug (8 actions or more).
  3. Resume from sleep.

There is another way to get the laptop into the problem state that I discovered:

  1. echo disable > /sys/firmware/acpi/interrupts/gpe17
  2. plug, unplug, plug, unplug, plug, unplug, plug, unplug (8 actions)
  3. echo enable > /sys/firmware/acpi/interrupts/gpe17

Reading the presentation in this link provided me with the inspiration to try to query the embedded controller directly. Little did I know that that was all that was required to make it unstuck, and see events again being produced with “acpi_listen” in the console.

The Fix

I made this program to poll the events left in the embedded controller directly. I realized that it fixes the issue immediately (in the same way that hitting the reset button through the small hole in the back). You can run it at any time, but ideally run right after resume from suspend.

What we can deduce from this is that the Embedded Controller keeps trying to report events while the computer is suspended. Since there is no OS to reply to them, it accumulates them, and after a certain number of events (battery percentage drop or increase, AC plugged or unplugged, Lid open or closed, are all each one event), after a certain number of events, it stops producing GPE 0x17 and the OS doesn’t query them anymore… CLASSIC CHICKEN AND EGG situation!!! because the OS only queries the events when it receives a GPE interruption.

You can find it in post #102 of the launchpad issue. I’ll put it here too.

There is a better solution based on my research, that Lan Tianyu made as a patch to the kernel, waiting for further tests in the kernel bugzilla issue 44161, see comments #133 and #135 there, I confirmed that it does the same job perfectly as my workaround fix.  If that gets merged to the kernel someday, you won’t need my workaround anymore to enjoy your laptop. The kernel patch uses proper ec.c functions to query the Embedded Controller after resume from sleep, instead of direct port poking.

UPDATE (1) 2014-02-23: Kieran Clancy has made a better kernel patch and posted it on the usual kernel bugzilla, comment #149. I tested it and it does the job, perfectly as my userspace program workaround does, but from inside the kernel. It also unstucks the EC not only on resume but also on start.

UPDATE (2) 2014-02-23: I made a better workaround that doesn’t hardcode the EC port numbers as constants but instead reads them from /proc/ioports. It is a much safer workaround, and should work on any model. It also now sleeps for a millisecond between queries, which I found doesn’t query the same event more than once. It is now also GPL. It follows below (you can still find the original in the launchpad issue post #102).

UPDATE (3) 2014-03-01: The latest version of the fix in the form of a kernel patch by Kieran Clancy has been posted to linux-acpi and linux-kernel mailing lists: http://marc.info/?l=linux-acpi&m=139359680828880&w=2 You can still use my non-kernel workaround if you don’t want to recompile a new kernel. But this is great news, it means that the fix is moving towards being included by default some day in the kernel.

My workaround fix (Updated with version2, doesn’t hardcode EC ports anymore):

---- samsung_fix_ec_events.c ------------------------------------------
/*
* samsung_fix_ec_events
*
* Copyright © 2014 Juan Manuel Cabo <juanmanuel.cabo@gmail.com>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program.  If not, see <http://www.gnu.org/licenses/>.
*/

#include <stdio.h>
#include <unistd.h>
#include <sys/io.h>
#include <string.h>
#include <stdlib.h>

/* INSTRUCTIONS:
 *
 * Compile with:
 *    gcc -o samsung_fix_ec_events samsung_fix_ec_events.c
 * Run as:
 *    sudo ./samsung_fix_ec_events
 * You may copy it to /usr/local/bin/samsung_fix_ec_events and then call it 
 * automatically after resume from sleep with a /usr/lib/pm-utils/sleep.d script:
 *    sudo cp ./samsung_fix_ec_events /usr/local/bin/
 */

/* Constants: */
enum ec_command {
    /* Standard command for querying LID, Battery and AC events: */
    ACPI_EC_COMMAND_QUERY = 0x84,
    LINE_BUFFER_SIZE = 1024
};

int get_ec_ports(int* ec_cmd_port, int* ec_data_port);

int main (int argc, char** argv) {
    char status = 0;
    char data = 0;
    int count = 0;

    /* Get EC data and command ports numbers from /proc/ioports */
    int ec_cmd_port = 0;
    int ec_data_port = 0;
    if (!get_ec_ports(&ec_cmd_port, &ec_data_port) || !ec_cmd_port || !ec_data_port) {
        fprintf(stderr, "Error: couldn't determine EC ports by looking in '/proc/ioports'.\n");
        return 1;
    }
    printf("Using EC ports 0x%x and 0x%x\n", ec_cmd_port, ec_data_port);

    /* Ask for permition to use the EC ports for the duration of this execution only: */
    if (iopl(3)) {
        fprintf(stderr, "Error: Permission to read/write to EmbeddedController port not granted.\n");
        return 1;
    }

    /* Query AC, Battery, LID, etc. events until there are no more. 
     * This clears them for the EC so that it can send them again in the future, thus unblocking the EC. */
    do {
        outb(ACPI_EC_COMMAND_QUERY, ec_cmd_port);
        status = inb(ec_cmd_port);
        data = inb(ec_data_port);
        printf("CommandQuery 0x84, status=0x%x, data=0x%x\n", status, data);
        usleep(1000); /* Give the EC 1ms time between queries */
    } while (data != 0 && ++count < 1000);

    printf("EmbeddedController GPE events flushed. New events can be produced now.\n");

    return 0;
}

/* Reads EC port addresses from /proc/ioports. Returns 1 if successful */
int get_ec_ports(int* ec_cmd_port, int* ec_data_port) {
    if (!ec_cmd_port || !ec_data_port) {
        return 0;
    }
    *ec_cmd_port = 0;
    *ec_data_port = 0;

    /* Allocate line buffer on the heap for parsing */
    char* line_buffer = (char*) malloc(LINE_BUFFER_SIZE);
    if (!line_buffer) {
        return 0;
    }

    /* Open ioports file */
    FILE* ioports_file = fopen("/proc/ioports", "r");
    if (!ioports_file) {
        return 0;
    }

    /* Find ports 'EC data' and 'EC cmd' */
    char* line = NULL;
    while (line = fgets(line_buffer, LINE_BUFFER_SIZE, ioports_file)) {
        if (strstr(line, "EC cmd")) {
            if (sscanf(line, "%x", ec_cmd_port) != 1) {
                return 0;
            }
        } else if (strstr(line, "EC data")) {
            if (sscanf(line, "%x", ec_data_port) != 1) {
                return 0;
            }
        }
    }
    fclose(ioports_file);
    free(line_buffer);
    if (ec_cmd_port == 0 || ec_data_port == 0) {
        return 0;
    }
    /* Found both ports successfully */
    return 1;
}
----END samsung_fix_ec_events.c ------------------------------------------

----- 99samsung_fix_ec_events ----------------------------------------------
#!/bin/sh
# NOTE: Put this file in: /usr/lib/pm-utils/sleep.d/99samsung_fix_ec_events
# and do: chmod +x 99samsung_fix_ec_events
#
# On some samsung laptops (series 5 2012, series 9 2011, etc) , if many EC
# GPE events are produced during sleep (AC plugged/unplugged, battery % change
# change, LID open/close, etc), and they are not queried, the
# EmbeddedController stops sending them and this creates a chicken and egg
# situation, that can only be resolved either by hitting the reset button in
# the back while powered off, or by simply forcing a query for the events here after resume.

case "$1" in
    hibernate|suspend)
        ;;
    thaw|resume)
        #NOTE: edit this path if necessary to point to the program with the fix:
        /usr/local/bin/samsung_fix_ec_events
        ;;
    *) exit $NA
        ;;
esac

exit 0

----- END 99samsung_fix_ec_events -------------------------------------------

Affected Laptops

I want to thank the community so much for all the good feedback that I received. This bug has been causing frustration to many people since at least two years. I’m just happy that I found a diagnosis of it, and a solution, and that I could share it with all of you guys. In particular I want to thank Kieran Clancy for bringing my workaround fix to the attention of the linux kernel people, where a kernel patch has now been made and I guess is now waiting for further tests.

From the feedback I gathered so far, my fix works in the following models:

  • Samsung Series 5 (models NP530U3C, NP535U3C, NP530U3B, NP550P5C)
  • Samsung Series 9 (models NP900X3F, NP900X4B, NP900X4C, NP900X4D, NP900X3C)

there maybe more models out there affected by this issue. Mine is the NP530U3C (2012 Ivy Bridge i5 cpu, bought in november 2013). And there might be more problems solved by this than just the LID close detection and AC status detection. For instance, Dennis Jansen has numbers showing that it fixed performance issues with his laptop: https://bugzilla.kernel.org/show_bug.cgi?id=57271#c29 Its no surprise really, the embedded controller handles many tasks and provides a lot of different events.

Feedback

Please, leave comments as desired here.

Cheers!
— Juan Manuel Cabo

Share Button