Uploaded image for project: 'Qt Quality Assurance Infrastructure'
  1. Qt Quality Assurance Infrastructure
  2. QTQAINFRA-1754

VM host that has crashed should be automatically rebooted

    XMLWordPrintable

Details

    • Bug
    • Resolution: Won't Do
    • P2: Important
    • None
    • unversioned
    • None

    Description

      Definition of done: When host goes down, it should automatically reboot and not require any manual steps

      Currently VM host crash is detected only by this process:

      1. VM host crashes
      2. After 5h developer sees "Timeout" in Coin
      3. Developer does restage and is happy
      4. Developer 2 sees "Timeout" in Coin
      5. Developer 2 does restage and is happy
      6. Developer 3 sees "Timeout" in Coin
      7. Developer 3 asks in IRC about the timeouts
      8. Someone from CI restarts the host

      What should happen is:

      1. VM host crashes
      2. Coin detects that the host has crashed and restarts work items that were running on it (QTQAINFRA-1749)
      3. Automatic monitoring detects the host is down and restarts the host (QTQAINFRA-1754)
      4. Root cause of the problem is diagnosed/categorized and reported to CI operators (QTQAINFRA-1778)

       Definition of done for this ticket: Automatic monitoring detects the host is down and restarts the host

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              tosaraja Tony Sarajärvi
              sanurmen Sami Nurmenniemi
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes