| rjw | 1f88458 | 2022-01-06 17:20:42 +0800 | [diff] [blame] | 1 | MDS - Microarchitectural Data Sampling | 
|  | 2 | ====================================== | 
|  | 3 |  | 
|  | 4 | Microarchitectural Data Sampling is a hardware vulnerability which allows | 
|  | 5 | unprivileged speculative access to data which is available in various CPU | 
|  | 6 | internal buffers. | 
|  | 7 |  | 
|  | 8 | Affected processors | 
|  | 9 | ------------------- | 
|  | 10 |  | 
|  | 11 | This vulnerability affects a wide range of Intel processors. The | 
|  | 12 | vulnerability is not present on: | 
|  | 13 |  | 
|  | 14 | - Processors from AMD, Centaur and other non Intel vendors | 
|  | 15 |  | 
|  | 16 | - Older processor models, where the CPU family is < 6 | 
|  | 17 |  | 
|  | 18 | - Some Atoms (Bonnell, Saltwell, Goldmont, GoldmontPlus) | 
|  | 19 |  | 
|  | 20 | - Intel processors which have the ARCH_CAP_MDS_NO bit set in the | 
|  | 21 | IA32_ARCH_CAPABILITIES MSR. | 
|  | 22 |  | 
|  | 23 | Whether a processor is affected or not can be read out from the MDS | 
|  | 24 | vulnerability file in sysfs. See :ref:`mds_sys_info`. | 
|  | 25 |  | 
|  | 26 | Not all processors are affected by all variants of MDS, but the mitigation | 
|  | 27 | is identical for all of them so the kernel treats them as a single | 
|  | 28 | vulnerability. | 
|  | 29 |  | 
|  | 30 | Related CVEs | 
|  | 31 | ------------ | 
|  | 32 |  | 
|  | 33 | The following CVE entries are related to the MDS vulnerability: | 
|  | 34 |  | 
|  | 35 | ==============  =====  =================================================== | 
|  | 36 | CVE-2018-12126  MSBDS  Microarchitectural Store Buffer Data Sampling | 
|  | 37 | CVE-2018-12130  MFBDS  Microarchitectural Fill Buffer Data Sampling | 
|  | 38 | CVE-2018-12127  MLPDS  Microarchitectural Load Port Data Sampling | 
|  | 39 | CVE-2019-11091  MDSUM  Microarchitectural Data Sampling Uncacheable Memory | 
|  | 40 | ==============  =====  =================================================== | 
|  | 41 |  | 
|  | 42 | Problem | 
|  | 43 | ------- | 
|  | 44 |  | 
|  | 45 | When performing store, load, L1 refill operations, processors write data | 
|  | 46 | into temporary microarchitectural structures (buffers). The data in the | 
|  | 47 | buffer can be forwarded to load operations as an optimization. | 
|  | 48 |  | 
|  | 49 | Under certain conditions, usually a fault/assist caused by a load | 
|  | 50 | operation, data unrelated to the load memory address can be speculatively | 
|  | 51 | forwarded from the buffers. Because the load operation causes a fault or | 
|  | 52 | assist and its result will be discarded, the forwarded data will not cause | 
|  | 53 | incorrect program execution or state changes. But a malicious operation | 
|  | 54 | may be able to forward this speculative data to a disclosure gadget which | 
|  | 55 | allows in turn to infer the value via a cache side channel attack. | 
|  | 56 |  | 
|  | 57 | Because the buffers are potentially shared between Hyper-Threads cross | 
|  | 58 | Hyper-Thread attacks are possible. | 
|  | 59 |  | 
|  | 60 | Deeper technical information is available in the MDS specific x86 | 
|  | 61 | architecture section: :ref:`Documentation/x86/mds.rst <mds>`. | 
|  | 62 |  | 
|  | 63 |  | 
|  | 64 | Attack scenarios | 
|  | 65 | ---------------- | 
|  | 66 |  | 
|  | 67 | Attacks against the MDS vulnerabilities can be mounted from malicious non | 
|  | 68 | priviledged user space applications running on hosts or guest. Malicious | 
|  | 69 | guest OSes can obviously mount attacks as well. | 
|  | 70 |  | 
|  | 71 | Contrary to other speculation based vulnerabilities the MDS vulnerability | 
|  | 72 | does not allow the attacker to control the memory target address. As a | 
|  | 73 | consequence the attacks are purely sampling based, but as demonstrated with | 
|  | 74 | the TLBleed attack samples can be postprocessed successfully. | 
|  | 75 |  | 
|  | 76 | Web-Browsers | 
|  | 77 | ^^^^^^^^^^^^ | 
|  | 78 |  | 
|  | 79 | It's unclear whether attacks through Web-Browsers are possible at | 
|  | 80 | all. The exploitation through Java-Script is considered very unlikely, | 
|  | 81 | but other widely used web technologies like Webassembly could possibly be | 
|  | 82 | abused. | 
|  | 83 |  | 
|  | 84 |  | 
|  | 85 | .. _mds_sys_info: | 
|  | 86 |  | 
|  | 87 | MDS system information | 
|  | 88 | ----------------------- | 
|  | 89 |  | 
|  | 90 | The Linux kernel provides a sysfs interface to enumerate the current MDS | 
|  | 91 | status of the system: whether the system is vulnerable, and which | 
|  | 92 | mitigations are active. The relevant sysfs file is: | 
|  | 93 |  | 
|  | 94 | /sys/devices/system/cpu/vulnerabilities/mds | 
|  | 95 |  | 
|  | 96 | The possible values in this file are: | 
|  | 97 |  | 
|  | 98 | .. list-table:: | 
|  | 99 |  | 
|  | 100 | * - 'Not affected' | 
|  | 101 | - The processor is not vulnerable | 
|  | 102 | * - 'Vulnerable' | 
|  | 103 | - The processor is vulnerable, but no mitigation enabled | 
|  | 104 | * - 'Vulnerable: Clear CPU buffers attempted, no microcode' | 
|  | 105 | - The processor is vulnerable but microcode is not updated. | 
|  | 106 |  | 
|  | 107 | The mitigation is enabled on a best effort basis. See :ref:`vmwerv` | 
|  | 108 | * - 'Mitigation: Clear CPU buffers' | 
|  | 109 | - The processor is vulnerable and the CPU buffer clearing mitigation is | 
|  | 110 | enabled. | 
|  | 111 |  | 
|  | 112 | If the processor is vulnerable then the following information is appended | 
|  | 113 | to the above information: | 
|  | 114 |  | 
|  | 115 | ========================  ============================================ | 
|  | 116 | 'SMT vulnerable'          SMT is enabled | 
|  | 117 | 'SMT mitigated'           SMT is enabled and mitigated | 
|  | 118 | 'SMT disabled'            SMT is disabled | 
|  | 119 | 'SMT Host state unknown'  Kernel runs in a VM, Host SMT state unknown | 
|  | 120 | ========================  ============================================ | 
|  | 121 |  | 
|  | 122 | .. _vmwerv: | 
|  | 123 |  | 
|  | 124 | Best effort mitigation mode | 
|  | 125 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^ | 
|  | 126 |  | 
|  | 127 | If the processor is vulnerable, but the availability of the microcode based | 
|  | 128 | mitigation mechanism is not advertised via CPUID the kernel selects a best | 
|  | 129 | effort mitigation mode.  This mode invokes the mitigation instructions | 
|  | 130 | without a guarantee that they clear the CPU buffers. | 
|  | 131 |  | 
|  | 132 | This is done to address virtualization scenarios where the host has the | 
|  | 133 | microcode update applied, but the hypervisor is not yet updated to expose | 
|  | 134 | the CPUID to the guest. If the host has updated microcode the protection | 
|  | 135 | takes effect otherwise a few cpu cycles are wasted pointlessly. | 
|  | 136 |  | 
|  | 137 | The state in the mds sysfs file reflects this situation accordingly. | 
|  | 138 |  | 
|  | 139 |  | 
|  | 140 | Mitigation mechanism | 
|  | 141 | ------------------------- | 
|  | 142 |  | 
|  | 143 | The kernel detects the affected CPUs and the presence of the microcode | 
|  | 144 | which is required. | 
|  | 145 |  | 
|  | 146 | If a CPU is affected and the microcode is available, then the kernel | 
|  | 147 | enables the mitigation by default. The mitigation can be controlled at boot | 
|  | 148 | time via a kernel command line option. See | 
|  | 149 | :ref:`mds_mitigation_control_command_line`. | 
|  | 150 |  | 
|  | 151 | .. _cpu_buffer_clear: | 
|  | 152 |  | 
|  | 153 | CPU buffer clearing | 
|  | 154 | ^^^^^^^^^^^^^^^^^^^ | 
|  | 155 |  | 
|  | 156 | The mitigation for MDS clears the affected CPU buffers on return to user | 
|  | 157 | space and when entering a guest. | 
|  | 158 |  | 
|  | 159 | If SMT is enabled it also clears the buffers on idle entry when the CPU | 
|  | 160 | is only affected by MSBDS and not any other MDS variant, because the | 
|  | 161 | other variants cannot be protected against cross Hyper-Thread attacks. | 
|  | 162 |  | 
|  | 163 | For CPUs which are only affected by MSBDS the user space, guest and idle | 
|  | 164 | transition mitigations are sufficient and SMT is not affected. | 
|  | 165 |  | 
|  | 166 | .. _virt_mechanism: | 
|  | 167 |  | 
|  | 168 | Virtualization mitigation | 
|  | 169 | ^^^^^^^^^^^^^^^^^^^^^^^^^ | 
|  | 170 |  | 
|  | 171 | The protection for host to guest transition depends on the L1TF | 
|  | 172 | vulnerability of the CPU: | 
|  | 173 |  | 
|  | 174 | - CPU is affected by L1TF: | 
|  | 175 |  | 
|  | 176 | If the L1D flush mitigation is enabled and up to date microcode is | 
|  | 177 | available, the L1D flush mitigation is automatically protecting the | 
|  | 178 | guest transition. | 
|  | 179 |  | 
|  | 180 | If the L1D flush mitigation is disabled then the MDS mitigation is | 
|  | 181 | invoked explicit when the host MDS mitigation is enabled. | 
|  | 182 |  | 
|  | 183 | For details on L1TF and virtualization see: | 
|  | 184 | :ref:`Documentation/admin-guide/hw-vuln//l1tf.rst <mitigation_control_kvm>`. | 
|  | 185 |  | 
|  | 186 | - CPU is not affected by L1TF: | 
|  | 187 |  | 
|  | 188 | CPU buffers are flushed before entering the guest when the host MDS | 
|  | 189 | mitigation is enabled. | 
|  | 190 |  | 
|  | 191 | The resulting MDS protection matrix for the host to guest transition: | 
|  | 192 |  | 
|  | 193 | ============ ===== ============= ============ ================= | 
|  | 194 | L1TF         MDS   VMX-L1FLUSH   Host MDS     MDS-State | 
|  | 195 |  | 
|  | 196 | Don't care   No    Don't care    N/A          Not affected | 
|  | 197 |  | 
|  | 198 | Yes          Yes   Disabled      Off          Vulnerable | 
|  | 199 |  | 
|  | 200 | Yes          Yes   Disabled      Full         Mitigated | 
|  | 201 |  | 
|  | 202 | Yes          Yes   Enabled       Don't care   Mitigated | 
|  | 203 |  | 
|  | 204 | No           Yes   N/A           Off          Vulnerable | 
|  | 205 |  | 
|  | 206 | No           Yes   N/A           Full         Mitigated | 
|  | 207 | ============ ===== ============= ============ ================= | 
|  | 208 |  | 
|  | 209 | This only covers the host to guest transition, i.e. prevents leakage from | 
|  | 210 | host to guest, but does not protect the guest internally. Guests need to | 
|  | 211 | have their own protections. | 
|  | 212 |  | 
|  | 213 | .. _xeon_phi: | 
|  | 214 |  | 
|  | 215 | XEON PHI specific considerations | 
|  | 216 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | 
|  | 217 |  | 
|  | 218 | The XEON PHI processor family is affected by MSBDS which can be exploited | 
|  | 219 | cross Hyper-Threads when entering idle states. Some XEON PHI variants allow | 
|  | 220 | to use MWAIT in user space (Ring 3) which opens an potential attack vector | 
|  | 221 | for malicious user space. The exposure can be disabled on the kernel | 
|  | 222 | command line with the 'ring3mwait=disable' command line option. | 
|  | 223 |  | 
|  | 224 | XEON PHI is not affected by the other MDS variants and MSBDS is mitigated | 
|  | 225 | before the CPU enters a idle state. As XEON PHI is not affected by L1TF | 
|  | 226 | either disabling SMT is not required for full protection. | 
|  | 227 |  | 
|  | 228 | .. _mds_smt_control: | 
|  | 229 |  | 
|  | 230 | SMT control | 
|  | 231 | ^^^^^^^^^^^ | 
|  | 232 |  | 
|  | 233 | All MDS variants except MSBDS can be attacked cross Hyper-Threads. That | 
|  | 234 | means on CPUs which are affected by MFBDS or MLPDS it is necessary to | 
|  | 235 | disable SMT for full protection. These are most of the affected CPUs; the | 
|  | 236 | exception is XEON PHI, see :ref:`xeon_phi`. | 
|  | 237 |  | 
|  | 238 | Disabling SMT can have a significant performance impact, but the impact | 
|  | 239 | depends on the type of workloads. | 
|  | 240 |  | 
|  | 241 | See the relevant chapter in the L1TF mitigation documentation for details: | 
|  | 242 | :ref:`Documentation/admin-guide/hw-vuln/l1tf.rst <smt_control>`. | 
|  | 243 |  | 
|  | 244 |  | 
|  | 245 | .. _mds_mitigation_control_command_line: | 
|  | 246 |  | 
|  | 247 | Mitigation control on the kernel command line | 
|  | 248 | --------------------------------------------- | 
|  | 249 |  | 
|  | 250 | The kernel command line allows to control the MDS mitigations at boot | 
|  | 251 | time with the option "mds=". The valid arguments for this option are: | 
|  | 252 |  | 
|  | 253 | ============  ============================================================= | 
|  | 254 | full		If the CPU is vulnerable, enable all available mitigations | 
|  | 255 | for the MDS vulnerability, CPU buffer clearing on exit to | 
|  | 256 | userspace and when entering a VM. Idle transitions are | 
|  | 257 | protected as well if SMT is enabled. | 
|  | 258 |  | 
|  | 259 | It does not automatically disable SMT. | 
|  | 260 |  | 
|  | 261 | full,nosmt	The same as mds=full, with SMT disabled on vulnerable | 
|  | 262 | CPUs.  This is the complete mitigation. | 
|  | 263 |  | 
|  | 264 | off		Disables MDS mitigations completely. | 
|  | 265 |  | 
|  | 266 | ============  ============================================================= | 
|  | 267 |  | 
|  | 268 | Not specifying this option is equivalent to "mds=full". For processors | 
|  | 269 | that are affected by both TAA (TSX Asynchronous Abort) and MDS, | 
|  | 270 | specifying just "mds=off" without an accompanying "tsx_async_abort=off" | 
|  | 271 | will have no effect as the same mitigation is used for both | 
|  | 272 | vulnerabilities. | 
|  | 273 |  | 
|  | 274 | Mitigation selection guide | 
|  | 275 | -------------------------- | 
|  | 276 |  | 
|  | 277 | 1. Trusted userspace | 
|  | 278 | ^^^^^^^^^^^^^^^^^^^^ | 
|  | 279 |  | 
|  | 280 | If all userspace applications are from a trusted source and do not | 
|  | 281 | execute untrusted code which is supplied externally, then the mitigation | 
|  | 282 | can be disabled. | 
|  | 283 |  | 
|  | 284 |  | 
|  | 285 | 2. Virtualization with trusted guests | 
|  | 286 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | 
|  | 287 |  | 
|  | 288 | The same considerations as above versus trusted user space apply. | 
|  | 289 |  | 
|  | 290 | 3. Virtualization with untrusted guests | 
|  | 291 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | 
|  | 292 |  | 
|  | 293 | The protection depends on the state of the L1TF mitigations. | 
|  | 294 | See :ref:`virt_mechanism`. | 
|  | 295 |  | 
|  | 296 | If the MDS mitigation is enabled and SMT is disabled, guest to host and | 
|  | 297 | guest to guest attacks are prevented. | 
|  | 298 |  | 
|  | 299 | .. _mds_default_mitigations: | 
|  | 300 |  | 
|  | 301 | Default mitigations | 
|  | 302 | ------------------- | 
|  | 303 |  | 
|  | 304 | The kernel default mitigations for vulnerable processors are: | 
|  | 305 |  | 
|  | 306 | - Enable CPU buffer clearing | 
|  | 307 |  | 
|  | 308 | The kernel does not by default enforce the disabling of SMT, which leaves | 
|  | 309 | SMT systems vulnerable when running untrusted code. The same rationale as | 
|  | 310 | for L1TF applies. | 
|  | 311 | See :ref:`Documentation/admin-guide/hw-vuln//l1tf.rst <default_mitigations>`. |