b.liu | e958203 | 2025-04-17 19:18:16 +0800 | [diff] [blame^] | 1 | ======================== |
| 2 | SoundWire Error Handling |
| 3 | ======================== |
| 4 | |
| 5 | The SoundWire PHY was designed with care and errors on the bus are going to |
| 6 | be very unlikely, and if they happen it should be limited to single bit |
| 7 | errors. Examples of this design can be found in the synchronization |
| 8 | mechanism (sync loss after two errors) and short CRCs used for the Bulk |
| 9 | Register Access. |
| 10 | |
| 11 | The errors can be detected with multiple mechanisms: |
| 12 | |
| 13 | 1. Bus clash or parity errors: This mechanism relies on low-level detectors |
| 14 | that are independent of the payload and usages, and they cover both control |
| 15 | and audio data. The current implementation only logs such errors. |
| 16 | Improvements could be invalidating an entire programming sequence and |
| 17 | restarting from a known position. In the case of such errors outside of a |
| 18 | control/command sequence, there is no concealment or recovery for audio |
| 19 | data enabled by the SoundWire protocol, the location of the error will also |
| 20 | impact its audibility (most-significant bits will be more impacted in PCM), |
| 21 | and after a number of such errors are detected the bus might be reset. Note |
| 22 | that bus clashes due to programming errors (two streams using the same bit |
| 23 | slots) or electrical issues during the transmit/receive transition cannot |
| 24 | be distinguished, although a recurring bus clash when audio is enabled is a |
| 25 | indication of a bus allocation issue. The interrupt mechanism can also help |
| 26 | identify Slaves which detected a Bus Clash or a Parity Error, but they may |
| 27 | not be responsible for the errors so resetting them individually is not a |
| 28 | viable recovery strategy. |
| 29 | |
| 30 | 2. Command status: Each command is associated with a status, which only |
| 31 | covers transmission of the data between devices. The ACK status indicates |
| 32 | that the command was received and will be executed by the end of the |
| 33 | current frame. A NAK indicates that the command was in error and will not |
| 34 | be applied. In case of a bad programming (command sent to non-existent |
| 35 | Slave or to a non-implemented register) or electrical issue, no response |
| 36 | signals the command was ignored. Some Master implementations allow for a |
| 37 | command to be retransmitted several times. If the retransmission fails, |
| 38 | backtracking and restarting the entire programming sequence might be a |
| 39 | solution. Alternatively some implementations might directly issue a bus |
| 40 | reset and re-enumerate all devices. |
| 41 | |
| 42 | 3. Timeouts: In a number of cases such as ChannelPrepare or |
| 43 | ClockStopPrepare, the bus driver is supposed to poll a register field until |
| 44 | it transitions to a NotFinished value of zero. The MIPI SoundWire spec 1.1 |
| 45 | does not define timeouts but the MIPI SoundWire DisCo document adds |
| 46 | recommendation on timeouts. If such configurations do not complete, the |
| 47 | driver will return a -ETIMEOUT. Such timeouts are symptoms of a faulty |
| 48 | Slave device and are likely impossible to recover from. |
| 49 | |
| 50 | Errors during global reconfiguration sequences are extremely difficult to |
| 51 | handle: |
| 52 | |
| 53 | 1. BankSwitch: An error during the last command issuing a BankSwitch is |
| 54 | difficult to backtrack from. Retransmitting the Bank Switch command may be |
| 55 | possible in a single segment setup, but this can lead to synchronization |
| 56 | problems when enabling multiple bus segments (a command with side effects |
| 57 | such as frame reconfiguration would be handled at different times). A global |
| 58 | hard-reset might be the best solution. |
| 59 | |
| 60 | Note that SoundWire does not provide a mechanism to detect illegal values |
| 61 | written in valid registers. In a number of cases the standard even mentions |
| 62 | that the Slave might behave in implementation-defined ways. The bus |
| 63 | implementation does not provide a recovery mechanism for such errors, Slave |
| 64 | or Master driver implementers are responsible for writing valid values in |
| 65 | valid registers and implement additional range checking if needed. |