aboutsummaryrefslogtreecommitdiff
path: root/doc/v2/actions-deploy-to-recovery.rsti
blob: 7757b2b7783efe26c0db43216e668f37191c9767 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
.. index:: deploy to recovery

.. _deploy_to_recovery:

to: recovery
************

Deployment to ``recovery`` allows the use of device dictionary commands
and an LXC test shell to automate recovery mode operations on some
DUTs.

Successful use of recovery deployments require support by the admins
and by the test writers.

.. note:: In recovery mode, the device may have different identifiers
   and might no longer be unique. This can result in requiring a new
   device-type template and only creating one device of this type on
   any one worker. Not all devices can support automated recovery
   mode.

   Additionally, recovery deployments are **blind** - there is ``udev``
   support to add the device to the LXC but no serial connection, so no
   output will be read from the DUT. All tools and libraries required
   to execute the recovery test shell need to be added to the LXC. For
   example, using an earlier test shell inside the LXC.

#. Download scripts and binaries to transfer to the device
#. Copy the downloaded artifacts into the LXC.
#. Ensure that power to the device is OFF
#. Execute the ``recovery_mode_command`` to use relays or similar to
   put the device into recovery mode, in a dedicated :term:`namespace`.

   .. code-block:: jinja

    {% set recovery_mode_command = [
    '/home/neil/lava-lab/shared/lab-scripts/eth008_control -a 10.15.0.171 -r 1 -s off',
    '/home/neil/lava-lab/shared/lab-scripts/eth008_control -a 10.15.0.171 -r 2 -s on'] %}

#. Apply power.

   .. code-block:: jinja

    - boot:
      namespace: recovery
      timeout:
        minutes: 5
      method: recovery
      commands: recovery

The test job would then define a test action which executes the scripts
using the downloaded files and completes recovery. This script may have
to wait for the device to appear and as the device may then have an
unpredictable device node name, an action to create a symlink with a
known name is likely to be required. The use of LXC ensures that only
one suitable device exists, as long as the device configuration and
recovery mode operations only require a single device matching the
check in the recovery script.

Example: for the HiKey 6220, the `recovery mode operations
<https://github.com/96boards/documentation/wiki/HiKeyUEFI#flash-binaries-to-emmc->`_
could be executed as steps in the test shell as follows:

.. code-block:: yaml

    run:
      steps:
      - find /dev/ -name 'ttyUSB*' -xdev -type c -quit -exec ln -s {} /dev/recovery ';'
      - python /lava-lxc/hisi-idt.py --img1=/lava-lxc/l-loader.bin -d /dev/recovery
      # fastboot should wait for the device to reset here
      # udev rule copes with adding it to the LXC once it appears
      - fastboot flash ptable /lava-lxc/ptable-linux.img
      - fastboot flash ptable /lava-lxc/fip.bin
      - fastboot flash ptable /lava-lxc/nvme.img
      # next boot action takes care of exiting from recovery mode

.. important:: Make these commands **portable** so that the same script
   can be used to deploy new firmware to the device outside of LAVA.
   When using a test shell to handle firmware deployments, make sure
   that a failure of any test shell command fails the job by using
   ``lava-test-raise``.

   .. code-block:: shell

        command(){
            if [ -n "$(which lava-test-case || true)" ]; then
                echo $2
                $2 && lava-test-case "$1" --result pass || lava-test-raise "$1"
            else
                echo $2
                $2
            fi
        }

   Then call the function with two arguments, the test case name (with
   no spaces) and the command to execute (with substitutions for the
   parameterized variables for the files which were downloaded by the
   test job):

   .. code-block:: shell

    command 'hisi-idt-l-loader' "python ${SCRIPT} --img1=${LOADER} -d /dev/recovery"

   Take note of the quoting in this shell example. The first parameter
   can use single quotes but the second parameter **must** use double
   quotes ``"`` so that the values of ``$SCRIPT`` and ``$LOADER`` are
   substituted. Portable scripts are free to use whatever language you
   prefer.

.. seealso:: :ref:`test_definition_portability`

Examples for hikey 6220:

* https://git.linaro.org/lava-team/refactoring.git/plain/testdefs/hikey-6220-recovery.yaml
* https://git.linaro.org/lava-team/refactoring.git/tree/scripts/hikey-6220-recovery.sh

When the test shell exits, the device is reset using a second boot ``recovery``
operation.

.. code-block:: yaml

 - boot:
    namespace: recovery
    timeout:
      minutes: 5
    method: recovery
    commands: exit

A ``recovery_exit_command`` must be specified in the device dictionary.

.. code-block:: jinja

 {% set recovery_exit_command = [
 '/home/neil/lava-lab/shared/lab-scripts/eth008_control -a 10.15.0.171 -r 1 -s on',
 '/home/neil/lava-lab/shared/lab-scripts/eth008_control -a 10.15.0.171 -r 2 -s off'] %}

Test jobs can terminate early (either through bugs or cancellation), so
it is important to include the ``recovery_exit`` support in the
``power_off_command`` so that the device is left in a suitable state
for the next test job in the queue.

.. code-block:: jinja

 {% set power_off_command = ['/usr/bin/pduclient --daemon calvin --hostname pdu --command off --port 04',
 'sleep 30',
 '/home/neil/lava-lab/shared/lab-scripts/eth008_control -a 10.15.0.171 -r 1 -s on',
 '/home/neil/lava-lab/shared/lab-scripts/eth008_control -a 10.15.0.171 -r 2 -s off'] %}

The additional command may take some time to complete, so the timeout
of the power_off action may also need extending in the device-type
template.

.. code-block:: jinja

 {% set action_timeout_power_off = 60 %}