Bug Hunting with Linux (The story of the Samsung Ultrabooks lid close / AC status bug)

If you have a Samsung laptop series 5 or series 9 from 2011 and 2012, you’ll find the following interesting.

ultrabook

Share

If you have one of the following symptoms:

  • Lost the ability to get the laptop to sleep by closing the lid (both in windows and linux).
  • Linux doesn’t realize that you plugged or unplugged the power supply.
  • Windows takes many seconds to realize that you plugged or unplugged the power supply, instead of having immediate feedback.
  • Keyboard backlight doesn’t turn on at night (ambient light sensor events issue).

then it means that you have a “stuck” embedded controller.

For the anxious, I’ll let you know two more important facts, before getting into the story, the embedded controller talk, and the workaround fix:

  1. These issues are caused by an “unlucky” suspend/resume. If you never suspend your laptop, you’ll never get a stuck embedded controller.
  2. You can fix the symptoms temporarily by pushing the reset button through the small hole in the back of the laptop (the other side of the touchpad). The computer must be turned off and unplugged. After hitting the button, you’ll need to plug it to the wall to start for the first time. This is temporary and the issues will come back with a bad sleep / resume.

There is a permanent fix, which will not require you to hit the reset button. More on that later.

My end of the story:

I bought my Samsung ultrabook Series 5 in november 2013. I started noticing these issues during the first week. On that saturday I decided I had enough and spent the day trying to find a way to recover the lost functionality, and be able to go to suspend by closing the lid again. I figured that since the screen went off when closing the lid, it meant that the lid sensor was working fine, and that it must have been a BIOS or motherboard problem, since neither linux nor windows were capable of detecting the lid, wheras they both previously did.
I toggled all BIOS screen options and all Samsung Easy Settings options. None of this brought functionality back.
So I set out to reflash the BIOS, hoping that some hidden internal option causing this would get reset to default. But I quickly found out that Samsung bios update tool doesn’t let you flash the BIOS if it’s already the latest version. Long story short, that same saturday of november I found a way to do that and shared it here:
http://forum.notebookreview.com/samsung/683727-where-lid-closure-sensor-series-5-ultra-np530u3c.html#post9455595
After flashing the same BIOS version over, the lid detection and power supply unit (PSU) detection started working fine again.

A few weeks later the syndrome came back. I reflashed the BIOS again and it got fixed once again. This time I realized that it came back right after resuming the laptop from sleep, so I refrained from suspending it in the future. Using the laptop bacame no-fun. It lost its cool.

Two months later, now February 2014, I became careless and started suspending the laptop again.

The issue came back.

Closing the lid was ignored, and the charging icon persisted even after unplugging the laptop from the wall.

I searched online once again for this issue, and this time I found a bug report in ubuntu’s launchpad website. In that thread I found a better way to fix the issue that didn’t involve reflashing a bios, but just involved hitting the reset button in the back, which I never noticed before:

     https://bugs.launchpad.net/ubuntu/+source/linux/+bug/971061

Reading that forum, it quickly became obvious that they talked about the same problem, even though they mentioned different models. It confirmed to me that the issue was suspend related, and some posters suggested that the issue came back for sure when leaving the laptop suspended for a long time, though there was no precision as to why some suspensions didn’t trigger the bug and others did.

At first I just shared my own temporary fix (reflashing the bios), since I read that some folks thought that samsung fixed the issue with a new bios… though it is a temporary fix, not due to the new BIOS, but due to the act of flashing itself (which is actually because the embedded controller gets reset after a bios flash). And while sharing my own fix in post #97, I theorized that the “problem state” might be triggered by a poor implementation of Intel Rapid Start. Later I found I was wrong with Intel Rapid Start. It made some sense at the time to me, given that the posters said that a long time suspended triggered the issue and that Intel Rapid Start also gets triggered during suspend (by time or by battery low). I began to leave the laptop suspended overnight

Vacation time came from work. After a few days of recovering from lost hours of sleep, and having gone around the house fixing things (fawcets, etc.), getting up to date with housekeeping,  I gave a look at my ultrabook and thought to myself: “you’re next”. hahahaha

First I tried to put my theory of Intel Rapid Start interfering from sleep to the test. I had rapid start disabled since the day I bought it. So I thought that that could be the cause, I thought that the BIOS expected rapid start to be enabled and that when it didn’t find support for it the bug was triggered (but I was wrong). I enabled intel rapid start (this required me to resize the SSD partition to make room for a special hibernation partition). To enable intel rapid start, I had to do it from Easy Settings in windows, (otherwise, the INT3392 acpi device wouldn’t show up).  Having enabled it, I could now use the intel_irst module in linux, which had a /sys interface (wakeup_time and wakeup_events) to enable/disable rapid start triggers by time or battery, and to configure the time.
I started a text file keep a log of my tests.
I started leaving the laptop suspended for whole nights, trying different values for wakeup_time and wakeup_events.
I realized that they made absolutely no difference. Intel rapid start was ruled out as a cause.

I made further tests (see post #100 of the launchpad bug thread). Too many to mention here.

At the same time I studied my laptop’s DSDT. (Attached here). In case you don’t know: DSDT code is an almost platform independant code (it’s not assembler or machine code) provided by the BIOS to be run by the OS. This code is interpreted by the OS. It’s like “a driver provided by the BIOS” to be run and interpreted by linux or windows, to serve as an interface or glue. It’s different for each laptop, and when you update your BIOS, you are updating the DSDT too. The DSDT code is read once by the OS, on each boot, from the BIOS.

The DSDT showed that event handlers for LID and Battery status change were of course inplace, (_Q51, _Q52 (AC), _Q53, _Q54 _Q66 (Battery), _Q5E, _Q5F(LID)).

Searching for everything I needed to know regarding the production of those events, I found these two useful links, which gave me a break:

      http://kernel.ubuntu.com/~cking/presentations/gpes-and-embedded-controller/EmbeddedControllerAndACPI.odp

      https://wiki.edubuntu.org/Kernel/Reference/ACPITricksAndTips

It goes like this: when battery changes one percent (up or down), or when you close the lid on your laptop, the Embedded Controller notices this, and produces an interrupt through the general purpose event 0x17 (other laptops might have different GPE for the EC). Code in the OS handles the interruption, and asks the EC for the exact event produced:

  • 0x51 and 0x52 for AC plugged or unplugged.
  • 0x53 and 0x54, and 0x66 for battery changed
  • 0x5E and 0x5F for LID closed or open

If the event has a status with 0x20 bit mask set, the OS runs the corresponding _Q## method of the DSDT to handle the event. In the case of the LID, the _Q5E() method could be called, and if the lid status was different than before, an ACPI event gets generated through a call to Notify(LID0, 0x80). Pretty much the same for AC (ADP1 device) and Battery (BAT1 device) status.

Now, because of my tests, (see post #100) I realized that there was a rythm to all this. Events were missing in the problem state, and there was an almost exact number of events which I pinpointed as 8 plugs/unplugs from AC when suspended (or some battery drop or increase during suspension), that put the laptop into the problem state again. The Embedded Controller was at the center of all this. I also found the quickest way to reproduce the issue and get into the “problem state”, to test potential fixes:

  1. Sleep the computer in linux (by closing the lid or any other means).
  2. Unplug from the wall, plug, unplug, plug, unplug, plug, unplug, plug (8 actions or more).
  3. Resume from sleep.

There is another way to get the laptop into the problem state that I discovered:

  1. echo disable > /sys/firmware/acpi/interrupts/gpe17
  2. plug, unplug, plug, unplug, plug, unplug, plug, unplug (8 actions)
  3. echo enable > /sys/firmware/acpi/interrupts/gpe17

Reading the presentation in this link provided me with the inspiration to try to query the embedded controller directly. Little did I know that that was all that was required to make it unstuck, and see events again being produced with “acpi_listen” in the console.

The Fix

I made this program to poll the events left in the embedded controller directly. I realized that it fixes the issue immediately (in the same way that hitting the reset button through the small hole in the back). You can run it at any time, but ideally run right after resume from suspend.

What we can deduce from this is that the Embedded Controller keeps trying to report events while the computer is suspended. Since there is no OS to reply to them, it accumulates them, and after a certain number of events (battery percentage drop or increase, AC plugged or unplugged, Lid open or closed, are all each one event), after a certain number of events, it stops producing GPE 0x17 and the OS doesn’t query them anymore… CLASSIC CHICKEN AND EGG situation!!! because the OS only queries the events when it receives a GPE interruption.

You can find it in post #102 of the launchpad issue. I’ll put it here too.

There is a better solution based on my research, that Lan Tianyu made as a patch to the kernel, waiting for further tests in the kernel bugzilla issue 44161, see comments #133 and #135 there, I confirmed that it does the same job perfectly as my workaround fix.  If that gets merged to the kernel someday, you won’t need my workaround anymore to enjoy your laptop. The kernel patch uses proper ec.c functions to query the Embedded Controller after resume from sleep, instead of direct port poking.

UPDATE (1) 2014-02-23: Kieran Clancy has made a better kernel patch and posted it on the usual kernel bugzilla, comment #149. I tested it and it does the job, perfectly as my userspace program workaround does, but from inside the kernel. It also unstucks the EC not only on resume but also on start.

UPDATE (2) 2014-02-23: I made a better workaround that doesn’t hardcode the EC port numbers as constants but instead reads them from /proc/ioports. It is a much safer workaround, and should work on any model. It also now sleeps for a millisecond between queries, which I found doesn’t query the same event more than once. It is now also GPL. It follows below (you can still find the original in the launchpad issue post #102).

UPDATE (3) 2014-03-01: The latest version of the fix in the form of a kernel patch by Kieran Clancy has been posted to linux-acpi and linux-kernel mailing lists: http://marc.info/?l=linux-acpi&m=139359680828880&w=2 You can still use my non-kernel workaround if you don’t want to recompile a new kernel. But this is great news, it means that the fix is moving towards being included by default some day in the kernel.

My workaround fix (Updated with version2, doesn’t hardcode EC ports anymore):

---- samsung_fix_ec_events.c ------------------------------------------
/*
* samsung_fix_ec_events
*
* Copyright © 2014 Juan Manuel Cabo <juanmanuel.cabo@gmail.com>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program.  If not, see <http://www.gnu.org/licenses/>.
*/

#include <stdio.h>
#include <unistd.h>
#include <sys/io.h>
#include <string.h>
#include <stdlib.h>

/* INSTRUCTIONS:
 *
 * Compile with:
 *    gcc -o samsung_fix_ec_events samsung_fix_ec_events.c
 * Run as:
 *    sudo ./samsung_fix_ec_events
 * You may copy it to /usr/local/bin/samsung_fix_ec_events and then call it 
 * automatically after resume from sleep with a /usr/lib/pm-utils/sleep.d script:
 *    sudo cp ./samsung_fix_ec_events /usr/local/bin/
 */

/* Constants: */
enum ec_command {
    /* Standard command for querying LID, Battery and AC events: */
    ACPI_EC_COMMAND_QUERY = 0x84,
    LINE_BUFFER_SIZE = 1024
};

int get_ec_ports(int* ec_cmd_port, int* ec_data_port);

int main (int argc, char** argv) {
    char status = 0;
    char data = 0;
    int count = 0;

    /* Get EC data and command ports numbers from /proc/ioports */
    int ec_cmd_port = 0;
    int ec_data_port = 0;
    if (!get_ec_ports(&ec_cmd_port, &ec_data_port) || !ec_cmd_port || !ec_data_port) {
        fprintf(stderr, "Error: couldn't determine EC ports by looking in '/proc/ioports'.\n");
        return 1;
    }
    printf("Using EC ports 0x%x and 0x%x\n", ec_cmd_port, ec_data_port);

    /* Ask for permition to use the EC ports for the duration of this execution only: */
    if (iopl(3)) {
        fprintf(stderr, "Error: Permission to read/write to EmbeddedController port not granted.\n");
        return 1;
    }

    /* Query AC, Battery, LID, etc. events until there are no more. 
     * This clears them for the EC so that it can send them again in the future, thus unblocking the EC. */
    do {
        outb(ACPI_EC_COMMAND_QUERY, ec_cmd_port);
        status = inb(ec_cmd_port);
        data = inb(ec_data_port);
        printf("CommandQuery 0x84, status=0x%x, data=0x%x\n", status, data);
        usleep(1000); /* Give the EC 1ms time between queries */
    } while (data != 0 && ++count < 1000);

    printf("EmbeddedController GPE events flushed. New events can be produced now.\n");

    return 0;
}

/* Reads EC port addresses from /proc/ioports. Returns 1 if successful */
int get_ec_ports(int* ec_cmd_port, int* ec_data_port) {
    if (!ec_cmd_port || !ec_data_port) {
        return 0;
    }
    *ec_cmd_port = 0;
    *ec_data_port = 0;

    /* Allocate line buffer on the heap for parsing */
    char* line_buffer = (char*) malloc(LINE_BUFFER_SIZE);
    if (!line_buffer) {
        return 0;
    }

    /* Open ioports file */
    FILE* ioports_file = fopen("/proc/ioports", "r");
    if (!ioports_file) {
        return 0;
    }

    /* Find ports 'EC data' and 'EC cmd' */
    char* line = NULL;
    while (line = fgets(line_buffer, LINE_BUFFER_SIZE, ioports_file)) {
        if (strstr(line, "EC cmd")) {
            if (sscanf(line, "%x", ec_cmd_port) != 1) {
                return 0;
            }
        } else if (strstr(line, "EC data")) {
            if (sscanf(line, "%x", ec_data_port) != 1) {
                return 0;
            }
        }
    }
    fclose(ioports_file);
    free(line_buffer);
    if (ec_cmd_port == 0 || ec_data_port == 0) {
        return 0;
    }
    /* Found both ports successfully */
    return 1;
}
----END samsung_fix_ec_events.c ------------------------------------------

----- 99samsung_fix_ec_events ----------------------------------------------
#!/bin/sh
# NOTE: Put this file in: /usr/lib/pm-utils/sleep.d/99samsung_fix_ec_events
# and do: chmod +x 99samsung_fix_ec_events
#
# On some samsung laptops (series 5 2012, series 9 2011, etc) , if many EC
# GPE events are produced during sleep (AC plugged/unplugged, battery % change
# change, LID open/close, etc), and they are not queried, the
# EmbeddedController stops sending them and this creates a chicken and egg
# situation, that can only be resolved either by hitting the reset button in
# the back while powered off, or by simply forcing a query for the events here after resume.

case "$1" in
    hibernate|suspend)
        ;;
    thaw|resume)
        #NOTE: edit this path if necessary to point to the program with the fix:
        /usr/local/bin/samsung_fix_ec_events
        ;;
    *) exit $NA
        ;;
esac

exit 0

----- END 99samsung_fix_ec_events -------------------------------------------

Affected Laptops

I want to thank the community so much for all the good feedback that I received. This bug has been causing frustration to many people since at least two years. I’m just happy that I found a diagnosis of it, and a solution, and that I could share it with all of you guys. In particular I want to thank Kieran Clancy for bringing my workaround fix to the attention of the linux kernel people, where a kernel patch has now been made and I guess is now waiting for further tests.

From the feedback I gathered so far, my fix works in the following models:

  • Samsung Series 5 (models NP530U3C, NP535U3C, NP530U3B, NP550P5C)
  • Samsung Series 9 (models NP900X3F, NP900X4B, NP900X4C, NP900X4D, NP900X3C)

there maybe more models out there affected by this issue. Mine is the NP530U3C (2012 Ivy Bridge i5 cpu, bought in november 2013). And there might be more problems solved by this than just the LID close detection and AC status detection. For instance, Dennis Jansen has numbers showing that it fixed performance issues with his laptop: https://bugzilla.kernel.org/show_bug.cgi?id=57271#c29 Its no surprise really, the embedded controller handles many tasks and provides a lot of different events.

Feedback

Please, leave comments as desired here.

Cheers!
— Juan Manuel Cabo

Share
PayPal:

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>