Blame - src/kernel/linux/v4.19/Documentation/virtual/kvm/api.txt - T800

blob: 8e16017ff39701d335c1397db7ec2aeca653647e [file] [log] [blame]

xj	b04a402	2021-11-25 15:01:52 +0800	[diff] [blame]	1	The Definitive KVM (Kernel-based Virtual Machine) API Documentation
				2	===================================================================
				3
				4	1. General description
				5	----------------------
				6
				7	The kvm API is a set of ioctls that are issued to control various aspects
				8	of a virtual machine. The ioctls belong to three classes
				9
				10	- System ioctls: These query and set global attributes which affect the
				11	whole kvm subsystem. In addition a system ioctl is used to create
				12	virtual machines
				13
				14	- VM ioctls: These query and set attributes that affect an entire virtual
				15	machine, for example memory layout. In addition a VM ioctl is used to
				16	create virtual cpus (vcpus) and devices.
				17
				18	Only run VM ioctls from the same process (address space) that was used
				19	to create the VM.
				20
				21	- vcpu ioctls: These query and set attributes that control the operation
				22	of a single virtual cpu.
				23
				24	Only run vcpu ioctls from the same thread that was used to create the
				25	vcpu.
				26
				27	- device ioctls: These query and set attributes that control the operation
				28	of a single device.
				29
				30	device ioctls must be issued from the same process (address space) that
				31	was used to create the VM.
				32
				33	2. File descriptors
				34	-------------------
				35
				36	The kvm API is centered around file descriptors. An initial
				37	open("/dev/kvm") obtains a handle to the kvm subsystem; this handle
				38	can be used to issue system ioctls. A KVM_CREATE_VM ioctl on this
				39	handle will create a VM file descriptor which can be used to issue VM
				40	ioctls. A KVM_CREATE_VCPU or KVM_CREATE_DEVICE ioctl on a VM fd will
				41	create a virtual cpu or device and return a file descriptor pointing to
				42	the new resource. Finally, ioctls on a vcpu or device fd can be used
				43	to control the vcpu or device. For vcpus, this includes the important
				44	task of actually running guest code.
				45
				46	In general file descriptors can be migrated among processes by means
				47	of fork() and the SCM_RIGHTS facility of unix domain socket. These
				48	kinds of tricks are explicitly not supported by kvm. While they will
				49	not cause harm to the host, their actual behavior is not guaranteed by
				50	the API. The only supported use is one virtual machine per process,
				51	and one vcpu per thread.
				52
				53
				54	3. Extensions
				55	-------------
				56
				57	As of Linux 2.6.22, the KVM ABI has been stabilized: no backward
				58	incompatible change are allowed. However, there is an extension
				59	facility that allows backward-compatible extensions to the API to be
				60	queried and used.
				61
				62	The extension mechanism is not based on the Linux version number.
				63	Instead, kvm defines extension identifiers and a facility to query
				64	whether a particular extension identifier is available. If it is, a
				65	set of ioctls is available for application use.
				66
				67
				68	4. API description
				69	------------------
				70
				71	This section describes ioctls that can be used to control kvm guests.
				72	For each ioctl, the following information is provided along with a
				73	description:
				74
				75	Capability: which KVM extension provides this ioctl. Can be 'basic',
				76	which means that is will be provided by any kernel that supports
				77	API version 12 (see section 4.1), a KVM_CAP_xyz constant, which
				78	means availability needs to be checked with KVM_CHECK_EXTENSION
				79	(see section 4.4), or 'none' which means that while not all kernels
				80	support this ioctl, there's no capability bit to check its
				81	availability: for kernels that don't support the ioctl,
				82	the ioctl returns -ENOTTY.
				83
				84	Architectures: which instruction set architectures provide this ioctl.
				85	x86 includes both i386 and x86_64.
				86
				87	Type: system, vm, or vcpu.
				88
				89	Parameters: what parameters are accepted by the ioctl.
				90
				91	Returns: the return value. General error numbers (EBADF, ENOMEM, EINVAL)
				92	are not detailed, but errors with specific meanings are.
				93
				94
				95	4.1 KVM_GET_API_VERSION
				96
				97	Capability: basic
				98	Architectures: all
				99	Type: system ioctl
				100	Parameters: none
				101	Returns: the constant KVM_API_VERSION (=12)
				102
				103	This identifies the API version as the stable kvm API. It is not
				104	expected that this number will change. However, Linux 2.6.20 and
				105	2.6.21 report earlier versions; these are not documented and not
				106	supported. Applications should refuse to run if KVM_GET_API_VERSION
				107	returns a value other than 12. If this check passes, all ioctls
				108	described as 'basic' will be available.
				109
				110
				111	4.2 KVM_CREATE_VM
				112
				113	Capability: basic
				114	Architectures: all
				115	Type: system ioctl
				116	Parameters: machine type identifier (KVM_VM_*)
				117	Returns: a VM fd that can be used to control the new virtual machine.
				118
				119	The new VM has no virtual cpus and no memory.
				120	You probably want to use 0 as machine type.
				121
				122	In order to create user controlled virtual machines on S390, check
				123	KVM_CAP_S390_UCONTROL and use the flag KVM_VM_S390_UCONTROL as
				124	privileged user (CAP_SYS_ADMIN).
				125
				126	To use hardware assisted virtualization on MIPS (VZ ASE) rather than
				127	the default trap & emulate implementation (which changes the virtual
				128	memory layout to fit in user mode), check KVM_CAP_MIPS_VZ and use the
				129	flag KVM_VM_MIPS_VZ.
				130
				131
				132	4.3 KVM_GET_MSR_INDEX_LIST, KVM_GET_MSR_FEATURE_INDEX_LIST
				133
				134	Capability: basic, KVM_CAP_GET_MSR_FEATURES for KVM_GET_MSR_FEATURE_INDEX_LIST
				135	Architectures: x86
				136	Type: system ioctl
				137	Parameters: struct kvm_msr_list (in/out)
				138	Returns: 0 on success; -1 on error
				139	Errors:
				140	EFAULT: the msr index list cannot be read from or written to
				141	E2BIG: the msr index list is to be to fit in the array specified by
				142	the user.
				143
				144	struct kvm_msr_list {
				145	__u32 nmsrs; /* number of msrs in entries */
				146	__u32 indices[0];
				147	};
				148
				149	The user fills in the size of the indices array in nmsrs, and in return
				150	kvm adjusts nmsrs to reflect the actual number of msrs and fills in the
				151	indices array with their numbers.
				152
				153	KVM_GET_MSR_INDEX_LIST returns the guest msrs that are supported. The list
				154	varies by kvm version and host processor, but does not change otherwise.
				155
				156	Note: if kvm indicates supports MCE (KVM_CAP_MCE), then the MCE bank MSRs are
				157	not returned in the MSR list, as different vcpus can have a different number
				158	of banks, as set via the KVM_X86_SETUP_MCE ioctl.
				159
				160	KVM_GET_MSR_FEATURE_INDEX_LIST returns the list of MSRs that can be passed
				161	to the KVM_GET_MSRS system ioctl. This lets userspace probe host capabilities
				162	and processor features that are exposed via MSRs (e.g., VMX capabilities).
				163	This list also varies by kvm version and host processor, but does not change
				164	otherwise.
				165
				166
				167	4.4 KVM_CHECK_EXTENSION
				168
				169	Capability: basic, KVM_CAP_CHECK_EXTENSION_VM for vm ioctl
				170	Architectures: all
				171	Type: system ioctl, vm ioctl
				172	Parameters: extension identifier (KVM_CAP_*)
				173	Returns: 0 if unsupported; 1 (or some other positive integer) if supported
				174
				175	The API allows the application to query about extensions to the core
				176	kvm API. Userspace passes an extension identifier (an integer) and
				177	receives an integer that describes the extension availability.
				178	Generally 0 means no and 1 means yes, but some extensions may report
				179	additional information in the integer return value.
				180
				181	Based on their initialization different VMs may have different capabilities.
				182	It is thus encouraged to use the vm ioctl to query for capabilities (available
				183	with KVM_CAP_CHECK_EXTENSION_VM on the vm fd)
				184
				185	4.5 KVM_GET_VCPU_MMAP_SIZE
				186
				187	Capability: basic
				188	Architectures: all
				189	Type: system ioctl
				190	Parameters: none
				191	Returns: size of vcpu mmap area, in bytes
				192
				193	The KVM_RUN ioctl (cf.) communicates with userspace via a shared
				194	memory region. This ioctl returns the size of that region. See the
				195	KVM_RUN documentation for details.
				196
				197
				198	4.6 KVM_SET_MEMORY_REGION
				199
				200	Capability: basic
				201	Architectures: all
				202	Type: vm ioctl
				203	Parameters: struct kvm_memory_region (in)
				204	Returns: 0 on success, -1 on error
				205
				206	This ioctl is obsolete and has been removed.
				207
				208
				209	4.7 KVM_CREATE_VCPU
				210
				211	Capability: basic
				212	Architectures: all
				213	Type: vm ioctl
				214	Parameters: vcpu id (apic id on x86)
				215	Returns: vcpu fd on success, -1 on error
				216
				217	This API adds a vcpu to a virtual machine. No more than max_vcpus may be added.
				218	The vcpu id is an integer in the range [0, max_vcpu_id).
				219
				220	The recommended max_vcpus value can be retrieved using the KVM_CAP_NR_VCPUS of
				221	the KVM_CHECK_EXTENSION ioctl() at run-time.
				222	The maximum possible value for max_vcpus can be retrieved using the
				223	KVM_CAP_MAX_VCPUS of the KVM_CHECK_EXTENSION ioctl() at run-time.
				224
				225	If the KVM_CAP_NR_VCPUS does not exist, you should assume that max_vcpus is 4
				226	cpus max.
				227	If the KVM_CAP_MAX_VCPUS does not exist, you should assume that max_vcpus is
				228	same as the value returned from KVM_CAP_NR_VCPUS.
				229
				230	The maximum possible value for max_vcpu_id can be retrieved using the
				231	KVM_CAP_MAX_VCPU_ID of the KVM_CHECK_EXTENSION ioctl() at run-time.
				232
				233	If the KVM_CAP_MAX_VCPU_ID does not exist, you should assume that max_vcpu_id
				234	is the same as the value returned from KVM_CAP_MAX_VCPUS.
				235
				236	On powerpc using book3s_hv mode, the vcpus are mapped onto virtual
				237	threads in one or more virtual CPU cores. (This is because the
				238	hardware requires all the hardware threads in a CPU core to be in the
				239	same partition.) The KVM_CAP_PPC_SMT capability indicates the number
				240	of vcpus per virtual core (vcore). The vcore id is obtained by
				241	dividing the vcpu id by the number of vcpus per vcore. The vcpus in a
				242	given vcore will always be in the same physical core as each other
				243	(though that might be a different physical core from time to time).
				244	Userspace can control the threading (SMT) mode of the guest by its
				245	allocation of vcpu ids. For example, if userspace wants
				246	single-threaded guest vcpus, it should make all vcpu ids be a multiple
				247	of the number of vcpus per vcore.
				248
				249	For virtual cpus that have been created with S390 user controlled virtual
				250	machines, the resulting vcpu fd can be memory mapped at page offset
				251	KVM_S390_SIE_PAGE_OFFSET in order to obtain a memory map of the virtual
				252	cpu's hardware control block.
				253
				254
				255	4.8 KVM_GET_DIRTY_LOG (vm ioctl)
				256
				257	Capability: basic
				258	Architectures: x86
				259	Type: vm ioctl
				260	Parameters: struct kvm_dirty_log (in/out)
				261	Returns: 0 on success, -1 on error
				262
				263	/* for KVM_GET_DIRTY_LOG */
				264	struct kvm_dirty_log {
				265	__u32 slot;
				266	__u32 padding;
				267	union {
				268	void __user dirty_bitmap; / one bit per page */
				269	__u64 padding;
				270	};
				271	};
				272
				273	Given a memory slot, return a bitmap containing any pages dirtied
				274	since the last call to this ioctl. Bit 0 is the first page in the
				275	memory slot. Ensure the entire structure is cleared to avoid padding
				276	issues.
				277
				278	If KVM_CAP_MULTI_ADDRESS_SPACE is available, bits 16-31 specifies
				279	the address space for which you want to return the dirty bitmap.
				280	They must be less than the value that KVM_CHECK_EXTENSION returns for
				281	the KVM_CAP_MULTI_ADDRESS_SPACE capability.
				282
				283
				284	4.9 KVM_SET_MEMORY_ALIAS
				285
				286	Capability: basic
				287	Architectures: x86
				288	Type: vm ioctl
				289	Parameters: struct kvm_memory_alias (in)
				290	Returns: 0 (success), -1 (error)
				291
				292	This ioctl is obsolete and has been removed.
				293
				294
				295	4.10 KVM_RUN
				296
				297	Capability: basic
				298	Architectures: all
				299	Type: vcpu ioctl
				300	Parameters: none
				301	Returns: 0 on success, -1 on error
				302	Errors:
				303	EINTR: an unmasked signal is pending
				304
				305	This ioctl is used to run a guest virtual cpu. While there are no
				306	explicit parameters, there is an implicit parameter block that can be
				307	obtained by mmap()ing the vcpu fd at offset 0, with the size given by
				308	KVM_GET_VCPU_MMAP_SIZE. The parameter block is formatted as a 'struct
				309	kvm_run' (see below).
				310
				311
				312	4.11 KVM_GET_REGS
				313
				314	Capability: basic
				315	Architectures: all except ARM, arm64
				316	Type: vcpu ioctl
				317	Parameters: struct kvm_regs (out)
				318	Returns: 0 on success, -1 on error
				319
				320	Reads the general purpose registers from the vcpu.
				321
				322	/* x86 */
				323	struct kvm_regs {
				324	/* out (KVM_GET_REGS) / in (KVM_SET_REGS) */
				325	__u64 rax, rbx, rcx, rdx;
				326	__u64 rsi, rdi, rsp, rbp;
				327	__u64 r8, r9, r10, r11;
				328	__u64 r12, r13, r14, r15;
				329	__u64 rip, rflags;
				330	};
				331
				332	/* mips */
				333	struct kvm_regs {
				334	/* out (KVM_GET_REGS) / in (KVM_SET_REGS) */
				335	__u64 gpr[32];
				336	__u64 hi;
				337	__u64 lo;
				338	__u64 pc;
				339	};
				340
				341
				342	4.12 KVM_SET_REGS
				343
				344	Capability: basic
				345	Architectures: all except ARM, arm64
				346	Type: vcpu ioctl
				347	Parameters: struct kvm_regs (in)
				348	Returns: 0 on success, -1 on error
				349
				350	Writes the general purpose registers into the vcpu.
				351
				352	See KVM_GET_REGS for the data structure.
				353
				354
				355	4.13 KVM_GET_SREGS
				356
				357	Capability: basic
				358	Architectures: x86, ppc
				359	Type: vcpu ioctl
				360	Parameters: struct kvm_sregs (out)
				361	Returns: 0 on success, -1 on error
				362
				363	Reads special registers from the vcpu.
				364
				365	/* x86 */
				366	struct kvm_sregs {
				367	struct kvm_segment cs, ds, es, fs, gs, ss;
				368	struct kvm_segment tr, ldt;
				369	struct kvm_dtable gdt, idt;
				370	__u64 cr0, cr2, cr3, cr4, cr8;
				371	__u64 efer;
				372	__u64 apic_base;
				373	__u64 interrupt_bitmap[(KVM_NR_INTERRUPTS + 63) / 64];
				374	};
				375
				376	/* ppc -- see arch/powerpc/include/uapi/asm/kvm.h */
				377
				378	interrupt_bitmap is a bitmap of pending external interrupts. At most
				379	one bit may be set. This interrupt has been acknowledged by the APIC
				380	but not yet injected into the cpu core.
				381
				382
				383	4.14 KVM_SET_SREGS
				384
				385	Capability: basic
				386	Architectures: x86, ppc
				387	Type: vcpu ioctl
				388	Parameters: struct kvm_sregs (in)
				389	Returns: 0 on success, -1 on error
				390
				391	Writes special registers into the vcpu. See KVM_GET_SREGS for the
				392	data structures.
				393
				394
				395	4.15 KVM_TRANSLATE
				396
				397	Capability: basic
				398	Architectures: x86
				399	Type: vcpu ioctl
				400	Parameters: struct kvm_translation (in/out)
				401	Returns: 0 on success, -1 on error
				402
				403	Translates a virtual address according to the vcpu's current address
				404	translation mode.
				405
				406	struct kvm_translation {
				407	/* in */
				408	__u64 linear_address;
				409
				410	/* out */
				411	__u64 physical_address;
				412	__u8 valid;
				413	__u8 writeable;
				414	__u8 usermode;
				415	__u8 pad[5];
				416	};
				417
				418
				419	4.16 KVM_INTERRUPT
				420
				421	Capability: basic
				422	Architectures: x86, ppc, mips
				423	Type: vcpu ioctl
				424	Parameters: struct kvm_interrupt (in)
				425	Returns: 0 on success, negative on failure.
				426
				427	Queues a hardware interrupt vector to be injected.
				428
				429	/* for KVM_INTERRUPT */
				430	struct kvm_interrupt {
				431	/* in */
				432	__u32 irq;
				433	};
				434
				435	X86:
				436
				437	Returns: 0 on success,
				438	-EEXIST if an interrupt is already enqueued
				439	-EINVAL the the irq number is invalid
				440	-ENXIO if the PIC is in the kernel
				441	-EFAULT if the pointer is invalid
				442
				443	Note 'irq' is an interrupt vector, not an interrupt pin or line. This
				444	ioctl is useful if the in-kernel PIC is not used.
				445
				446	PPC:
				447
				448	Queues an external interrupt to be injected. This ioctl is overleaded
				449	with 3 different irq values:
				450
				451	a) KVM_INTERRUPT_SET
				452
				453	This injects an edge type external interrupt into the guest once it's ready
				454	to receive interrupts. When injected, the interrupt is done.
				455
				456	b) KVM_INTERRUPT_UNSET
				457
				458	This unsets any pending interrupt.
				459
				460	Only available with KVM_CAP_PPC_UNSET_IRQ.
				461
				462	c) KVM_INTERRUPT_SET_LEVEL
				463
				464	This injects a level type external interrupt into the guest context. The
				465	interrupt stays pending until a specific ioctl with KVM_INTERRUPT_UNSET
				466	is triggered.
				467
				468	Only available with KVM_CAP_PPC_IRQ_LEVEL.
				469
				470	Note that any value for 'irq' other than the ones stated above is invalid
				471	and incurs unexpected behavior.
				472
				473	MIPS:
				474
				475	Queues an external interrupt to be injected into the virtual CPU. A negative
				476	interrupt number dequeues the interrupt.
				477
				478
				479	4.17 KVM_DEBUG_GUEST
				480
				481	Capability: basic
				482	Architectures: none
				483	Type: vcpu ioctl
				484	Parameters: none)
				485	Returns: -1 on error
				486
				487	Support for this has been removed. Use KVM_SET_GUEST_DEBUG instead.
				488
				489
				490	4.18 KVM_GET_MSRS
				491
				492	Capability: basic (vcpu), KVM_CAP_GET_MSR_FEATURES (system)
				493	Architectures: x86
				494	Type: system ioctl, vcpu ioctl
				495	Parameters: struct kvm_msrs (in/out)
				496	Returns: number of msrs successfully returned;
				497	-1 on error
				498
				499	When used as a system ioctl:
				500	Reads the values of MSR-based features that are available for the VM. This
				501	is similar to KVM_GET_SUPPORTED_CPUID, but it returns MSR indices and values.
				502	The list of msr-based features can be obtained using KVM_GET_MSR_FEATURE_INDEX_LIST
				503	in a system ioctl.
				504
				505	When used as a vcpu ioctl:
				506	Reads model-specific registers from the vcpu. Supported msr indices can
				507	be obtained using KVM_GET_MSR_INDEX_LIST in a system ioctl.
				508
				509	struct kvm_msrs {
				510	__u32 nmsrs; /* number of msrs in entries */
				511	__u32 pad;
				512
				513	struct kvm_msr_entry entries[0];
				514	};
				515
				516	struct kvm_msr_entry {
				517	__u32 index;
				518	__u32 reserved;
				519	__u64 data;
				520	};
				521
				522	Application code should set the 'nmsrs' member (which indicates the
				523	size of the entries array) and the 'index' member of each array entry.
				524	kvm will fill in the 'data' member.
				525
				526
				527	4.19 KVM_SET_MSRS
				528
				529	Capability: basic
				530	Architectures: x86
				531	Type: vcpu ioctl
				532	Parameters: struct kvm_msrs (in)
				533	Returns: 0 on success, -1 on error
				534
				535	Writes model-specific registers to the vcpu. See KVM_GET_MSRS for the
				536	data structures.
				537
				538	Application code should set the 'nmsrs' member (which indicates the
				539	size of the entries array), and the 'index' and 'data' members of each
				540	array entry.
				541
				542
				543	4.20 KVM_SET_CPUID
				544
				545	Capability: basic
				546	Architectures: x86
				547	Type: vcpu ioctl
				548	Parameters: struct kvm_cpuid (in)
				549	Returns: 0 on success, -1 on error
				550
				551	Defines the vcpu responses to the cpuid instruction. Applications
				552	should use the KVM_SET_CPUID2 ioctl if available.
				553
				554
				555	struct kvm_cpuid_entry {
				556	__u32 function;
				557	__u32 eax;
				558	__u32 ebx;
				559	__u32 ecx;
				560	__u32 edx;
				561	__u32 padding;
				562	};
				563
				564	/* for KVM_SET_CPUID */
				565	struct kvm_cpuid {
				566	__u32 nent;
				567	__u32 padding;
				568	struct kvm_cpuid_entry entries[0];
				569	};
				570
				571
				572	4.21 KVM_SET_SIGNAL_MASK
				573
				574	Capability: basic
				575	Architectures: all
				576	Type: vcpu ioctl
				577	Parameters: struct kvm_signal_mask (in)
				578	Returns: 0 on success, -1 on error
				579
				580	Defines which signals are blocked during execution of KVM_RUN. This
				581	signal mask temporarily overrides the threads signal mask. Any
				582	unblocked signal received (except SIGKILL and SIGSTOP, which retain
				583	their traditional behaviour) will cause KVM_RUN to return with -EINTR.
				584
				585	Note the signal will only be delivered if not blocked by the original
				586	signal mask.
				587
				588	/* for KVM_SET_SIGNAL_MASK */
				589	struct kvm_signal_mask {
				590	__u32 len;
				591	__u8 sigset[0];
				592	};
				593
				594
				595	4.22 KVM_GET_FPU
				596
				597	Capability: basic
				598	Architectures: x86
				599	Type: vcpu ioctl
				600	Parameters: struct kvm_fpu (out)
				601	Returns: 0 on success, -1 on error
				602
				603	Reads the floating point state from the vcpu.
				604
				605	/* for KVM_GET_FPU and KVM_SET_FPU */
				606	struct kvm_fpu {
				607	__u8 fpr[8][16];
				608	__u16 fcw;
				609	__u16 fsw;
				610	__u8 ftwx; /* in fxsave format */
				611	__u8 pad1;
				612	__u16 last_opcode;
				613	__u64 last_ip;
				614	__u64 last_dp;
				615	__u8 xmm[16][16];
				616	__u32 mxcsr;
				617	__u32 pad2;
				618	};
				619
				620
				621	4.23 KVM_SET_FPU
				622
				623	Capability: basic
				624	Architectures: x86
				625	Type: vcpu ioctl
				626	Parameters: struct kvm_fpu (in)
				627	Returns: 0 on success, -1 on error
				628
				629	Writes the floating point state to the vcpu.
				630
				631	/* for KVM_GET_FPU and KVM_SET_FPU */
				632	struct kvm_fpu {
				633	__u8 fpr[8][16];
				634	__u16 fcw;
				635	__u16 fsw;
				636	__u8 ftwx; /* in fxsave format */
				637	__u8 pad1;
				638	__u16 last_opcode;
				639	__u64 last_ip;
				640	__u64 last_dp;
				641	__u8 xmm[16][16];
				642	__u32 mxcsr;
				643	__u32 pad2;
				644	};
				645
				646
				647	4.24 KVM_CREATE_IRQCHIP
				648
				649	Capability: KVM_CAP_IRQCHIP, KVM_CAP_S390_IRQCHIP (s390)
				650	Architectures: x86, ARM, arm64, s390
				651	Type: vm ioctl
				652	Parameters: none
				653	Returns: 0 on success, -1 on error
				654
				655	Creates an interrupt controller model in the kernel.
				656	On x86, creates a virtual ioapic, a virtual PIC (two PICs, nested), and sets up
				657	future vcpus to have a local APIC. IRQ routing for GSIs 0-15 is set to both
				658	PIC and IOAPIC; GSI 16-23 only go to the IOAPIC.
				659	On ARM/arm64, a GICv2 is created. Any other GIC versions require the usage of
				660	KVM_CREATE_DEVICE, which also supports creating a GICv2. Using
				661	KVM_CREATE_DEVICE is preferred over KVM_CREATE_IRQCHIP for GICv2.
				662	On s390, a dummy irq routing table is created.
				663
				664	Note that on s390 the KVM_CAP_S390_IRQCHIP vm capability needs to be enabled
				665	before KVM_CREATE_IRQCHIP can be used.
				666
				667
				668	4.25 KVM_IRQ_LINE
				669
				670	Capability: KVM_CAP_IRQCHIP
				671	Architectures: x86, arm, arm64
				672	Type: vm ioctl
				673	Parameters: struct kvm_irq_level
				674	Returns: 0 on success, -1 on error
				675
				676	Sets the level of a GSI input to the interrupt controller model in the kernel.
				677	On some architectures it is required that an interrupt controller model has
				678	been previously created with KVM_CREATE_IRQCHIP. Note that edge-triggered
				679	interrupts require the level to be set to 1 and then back to 0.
				680
				681	On real hardware, interrupt pins can be active-low or active-high. This
				682	does not matter for the level field of struct kvm_irq_level: 1 always
				683	means active (asserted), 0 means inactive (deasserted).
				684
				685	x86 allows the operating system to program the interrupt polarity
				686	(active-low/active-high) for level-triggered interrupts, and KVM used
				687	to consider the polarity. However, due to bitrot in the handling of
				688	active-low interrupts, the above convention is now valid on x86 too.
				689	This is signaled by KVM_CAP_X86_IOAPIC_POLARITY_IGNORED. Userspace
				690	should not present interrupts to the guest as active-low unless this
				691	capability is present (or unless it is not using the in-kernel irqchip,
				692	of course).
				693
				694
				695	ARM/arm64 can signal an interrupt either at the CPU level, or at the
				696	in-kernel irqchip (GIC), and for in-kernel irqchip can tell the GIC to
				697	use PPIs designated for specific cpus. The irq field is interpreted
				698	like this:
				699
				700	bits: \| 31 ... 24 \| 23 ... 16 \| 15 ... 0 \|
				701	field: \| irq_type \| vcpu_index \| irq_id \|
				702
				703	The irq_type field has the following values:
				704	- irq_type[0]: out-of-kernel GIC: irq_id 0 is IRQ, irq_id 1 is FIQ
				705	- irq_type[1]: in-kernel GIC: SPI, irq_id between 32 and 1019 (incl.)
				706	(the vcpu_index field is ignored)
				707	- irq_type[2]: in-kernel GIC: PPI, irq_id between 16 and 31 (incl.)
				708
				709	(The irq_id field thus corresponds nicely to the IRQ ID in the ARM GIC specs)
				710
				711	In both cases, level is used to assert/deassert the line.
				712
				713	struct kvm_irq_level {
				714	union {
				715	__u32 irq; /* GSI */
				716	__s32 status; /* not used for KVM_IRQ_LEVEL */
				717	};
				718	__u32 level; /* 0 or 1 */
				719	};
				720
				721
				722	4.26 KVM_GET_IRQCHIP
				723
				724	Capability: KVM_CAP_IRQCHIP
				725	Architectures: x86
				726	Type: vm ioctl
				727	Parameters: struct kvm_irqchip (in/out)
				728	Returns: 0 on success, -1 on error
				729
				730	Reads the state of a kernel interrupt controller created with
				731	KVM_CREATE_IRQCHIP into a buffer provided by the caller.
				732
				733	struct kvm_irqchip {
				734	__u32 chip_id; /* 0 = PIC1, 1 = PIC2, 2 = IOAPIC */
				735	__u32 pad;
				736	union {
				737	char dummy[512]; /* reserving space */
				738	struct kvm_pic_state pic;
				739	struct kvm_ioapic_state ioapic;
				740	} chip;
				741	};
				742
				743
				744	4.27 KVM_SET_IRQCHIP
				745
				746	Capability: KVM_CAP_IRQCHIP
				747	Architectures: x86
				748	Type: vm ioctl
				749	Parameters: struct kvm_irqchip (in)
				750	Returns: 0 on success, -1 on error
				751
				752	Sets the state of a kernel interrupt controller created with
				753	KVM_CREATE_IRQCHIP from a buffer provided by the caller.
				754
				755	struct kvm_irqchip {
				756	__u32 chip_id; /* 0 = PIC1, 1 = PIC2, 2 = IOAPIC */
				757	__u32 pad;
				758	union {
				759	char dummy[512]; /* reserving space */
				760	struct kvm_pic_state pic;
				761	struct kvm_ioapic_state ioapic;
				762	} chip;
				763	};
				764
				765
				766	4.28 KVM_XEN_HVM_CONFIG
				767
				768	Capability: KVM_CAP_XEN_HVM
				769	Architectures: x86
				770	Type: vm ioctl
				771	Parameters: struct kvm_xen_hvm_config (in)
				772	Returns: 0 on success, -1 on error
				773
				774	Sets the MSR that the Xen HVM guest uses to initialize its hypercall
				775	page, and provides the starting address and size of the hypercall
				776	blobs in userspace. When the guest writes the MSR, kvm copies one
				777	page of a blob (32- or 64-bit, depending on the vcpu mode) to guest
				778	memory.
				779
				780	struct kvm_xen_hvm_config {
				781	__u32 flags;
				782	__u32 msr;
				783	__u64 blob_addr_32;
				784	__u64 blob_addr_64;
				785	__u8 blob_size_32;
				786	__u8 blob_size_64;
				787	__u8 pad2[30];
				788	};
				789
				790
				791	4.29 KVM_GET_CLOCK
				792
				793	Capability: KVM_CAP_ADJUST_CLOCK
				794	Architectures: x86
				795	Type: vm ioctl
				796	Parameters: struct kvm_clock_data (out)
				797	Returns: 0 on success, -1 on error
				798
				799	Gets the current timestamp of kvmclock as seen by the current guest. In
				800	conjunction with KVM_SET_CLOCK, it is used to ensure monotonicity on scenarios
				801	such as migration.
				802
				803	When KVM_CAP_ADJUST_CLOCK is passed to KVM_CHECK_EXTENSION, it returns the
				804	set of bits that KVM can return in struct kvm_clock_data's flag member.
				805
				806	The only flag defined now is KVM_CLOCK_TSC_STABLE. If set, the returned
				807	value is the exact kvmclock value seen by all VCPUs at the instant
				808	when KVM_GET_CLOCK was called. If clear, the returned value is simply
				809	CLOCK_MONOTONIC plus a constant offset; the offset can be modified
				810	with KVM_SET_CLOCK. KVM will try to make all VCPUs follow this clock,
				811	but the exact value read by each VCPU could differ, because the host
				812	TSC is not stable.
				813
				814	struct kvm_clock_data {
				815	__u64 clock; /* kvmclock current value */
				816	__u32 flags;
				817	__u32 pad[9];
				818	};
				819
				820
				821	4.30 KVM_SET_CLOCK
				822
				823	Capability: KVM_CAP_ADJUST_CLOCK
				824	Architectures: x86
				825	Type: vm ioctl
				826	Parameters: struct kvm_clock_data (in)
				827	Returns: 0 on success, -1 on error
				828
				829	Sets the current timestamp of kvmclock to the value specified in its parameter.
				830	In conjunction with KVM_GET_CLOCK, it is used to ensure monotonicity on scenarios
				831	such as migration.
				832
				833	struct kvm_clock_data {
				834	__u64 clock; /* kvmclock current value */
				835	__u32 flags;
				836	__u32 pad[9];
				837	};
				838
				839
				840	4.31 KVM_GET_VCPU_EVENTS
				841
				842	Capability: KVM_CAP_VCPU_EVENTS
				843	Extended by: KVM_CAP_INTR_SHADOW
				844	Architectures: x86, arm, arm64
				845	Type: vcpu ioctl
				846	Parameters: struct kvm_vcpu_event (out)
				847	Returns: 0 on success, -1 on error
				848
				849	X86:
				850
				851	Gets currently pending exceptions, interrupts, and NMIs as well as related
				852	states of the vcpu.
				853
				854	struct kvm_vcpu_events {
				855	struct {
				856	__u8 injected;
				857	__u8 nr;
				858	__u8 has_error_code;
				859	__u8 pad;
				860	__u32 error_code;
				861	} exception;
				862	struct {
				863	__u8 injected;
				864	__u8 nr;
				865	__u8 soft;
				866	__u8 shadow;
				867	} interrupt;
				868	struct {
				869	__u8 injected;
				870	__u8 pending;
				871	__u8 masked;
				872	__u8 pad;
				873	} nmi;
				874	__u32 sipi_vector;
				875	__u32 flags;
				876	struct {
				877	__u8 smm;
				878	__u8 pending;
				879	__u8 smm_inside_nmi;
				880	__u8 latched_init;
				881	} smi;
				882	};
				883
				884	Only two fields are defined in the flags field:
				885
				886	- KVM_VCPUEVENT_VALID_SHADOW may be set in the flags field to signal that
				887	interrupt.shadow contains a valid state.
				888
				889	- KVM_VCPUEVENT_VALID_SMM may be set in the flags field to signal that
				890	smi contains a valid state.
				891
				892	ARM/ARM64:
				893
				894	If the guest accesses a device that is being emulated by the host kernel in
				895	such a way that a real device would generate a physical SError, KVM may make
				896	a virtual SError pending for that VCPU. This system error interrupt remains
				897	pending until the guest takes the exception by unmasking PSTATE.A.
				898
				899	Running the VCPU may cause it to take a pending SError, or make an access that
				900	causes an SError to become pending. The event's description is only valid while
				901	the VPCU is not running.
				902
				903	This API provides a way to read and write the pending 'event' state that is not
				904	visible to the guest. To save, restore or migrate a VCPU the struct representing
				905	the state can be read then written using this GET/SET API, along with the other
				906	guest-visible registers. It is not possible to 'cancel' an SError that has been
				907	made pending.
				908
				909	A device being emulated in user-space may also wish to generate an SError. To do
				910	this the events structure can be populated by user-space. The current state
				911	should be read first, to ensure no existing SError is pending. If an existing
				912	SError is pending, the architecture's 'Multiple SError interrupts' rules should
				913	be followed. (2.5.3 of DDI0587.a "ARM Reliability, Availability, and
				914	Serviceability (RAS) Specification").
				915
				916	SError exceptions always have an ESR value. Some CPUs have the ability to
				917	specify what the virtual SError's ESR value should be. These systems will
				918	advertise KVM_CAP_ARM_INJECT_SERROR_ESR. In this case exception.has_esr will
				919	always have a non-zero value when read, and the agent making an SError pending
				920	should specify the ISS field in the lower 24 bits of exception.serror_esr. If
				921	the system supports KVM_CAP_ARM_INJECT_SERROR_ESR, but user-space sets the events
				922	with exception.has_esr as zero, KVM will choose an ESR.
				923
				924	Specifying exception.has_esr on a system that does not support it will return
				925	-EINVAL. Setting anything other than the lower 24bits of exception.serror_esr
				926	will return -EINVAL.
				927
				928	struct kvm_vcpu_events {
				929	struct {
				930	__u8 serror_pending;
				931	__u8 serror_has_esr;
				932	/* Align it to 8 bytes */
				933	__u8 pad[6];
				934	__u64 serror_esr;
				935	} exception;
				936	__u32 reserved[12];
				937	};
				938
				939	4.32 KVM_SET_VCPU_EVENTS
				940
				941	Capability: KVM_CAP_VCPU_EVENTS
				942	Extended by: KVM_CAP_INTR_SHADOW
				943	Architectures: x86, arm, arm64
				944	Type: vcpu ioctl
				945	Parameters: struct kvm_vcpu_event (in)
				946	Returns: 0 on success, -1 on error
				947
				948	X86:
				949
				950	Set pending exceptions, interrupts, and NMIs as well as related states of the
				951	vcpu.
				952
				953	See KVM_GET_VCPU_EVENTS for the data structure.
				954
				955	Fields that may be modified asynchronously by running VCPUs can be excluded
				956	from the update. These fields are nmi.pending, sipi_vector, smi.smm,
				957	smi.pending. Keep the corresponding bits in the flags field cleared to
				958	suppress overwriting the current in-kernel state. The bits are:
				959
				960	KVM_VCPUEVENT_VALID_NMI_PENDING - transfer nmi.pending to the kernel
				961	KVM_VCPUEVENT_VALID_SIPI_VECTOR - transfer sipi_vector
				962	KVM_VCPUEVENT_VALID_SMM - transfer the smi sub-struct.
				963
				964	If KVM_CAP_INTR_SHADOW is available, KVM_VCPUEVENT_VALID_SHADOW can be set in
				965	the flags field to signal that interrupt.shadow contains a valid state and
				966	shall be written into the VCPU.
				967
				968	KVM_VCPUEVENT_VALID_SMM can only be set if KVM_CAP_X86_SMM is available.
				969
				970	ARM/ARM64:
				971
				972	Set the pending SError exception state for this VCPU. It is not possible to
				973	'cancel' an Serror that has been made pending.
				974
				975	See KVM_GET_VCPU_EVENTS for the data structure.
				976
				977
				978	4.33 KVM_GET_DEBUGREGS
				979
				980	Capability: KVM_CAP_DEBUGREGS
				981	Architectures: x86
				982	Type: vm ioctl
				983	Parameters: struct kvm_debugregs (out)
				984	Returns: 0 on success, -1 on error
				985
				986	Reads debug registers from the vcpu.
				987
				988	struct kvm_debugregs {
				989	__u64 db[4];
				990	__u64 dr6;
				991	__u64 dr7;
				992	__u64 flags;
				993	__u64 reserved[9];
				994	};
				995
				996
				997	4.34 KVM_SET_DEBUGREGS
				998
				999	Capability: KVM_CAP_DEBUGREGS
				1000	Architectures: x86
				1001	Type: vm ioctl
				1002	Parameters: struct kvm_debugregs (in)
				1003	Returns: 0 on success, -1 on error
				1004
				1005	Writes debug registers into the vcpu.
				1006
				1007	See KVM_GET_DEBUGREGS for the data structure. The flags field is unused
				1008	yet and must be cleared on entry.
				1009
				1010
				1011	4.35 KVM_SET_USER_MEMORY_REGION
				1012
				1013	Capability: KVM_CAP_USER_MEM
				1014	Architectures: all
				1015	Type: vm ioctl
				1016	Parameters: struct kvm_userspace_memory_region (in)
				1017	Returns: 0 on success, -1 on error
				1018
				1019	struct kvm_userspace_memory_region {
				1020	__u32 slot;
				1021	__u32 flags;
				1022	__u64 guest_phys_addr;
				1023	__u64 memory_size; /* bytes */
				1024	__u64 userspace_addr; /* start of the userspace allocated memory */
				1025	};
				1026
				1027	/* for kvm_memory_region::flags */
				1028	#define KVM_MEM_LOG_DIRTY_PAGES (1UL << 0)
				1029	#define KVM_MEM_READONLY (1UL << 1)
				1030
				1031	This ioctl allows the user to create or modify a guest physical memory
				1032	slot. When changing an existing slot, it may be moved in the guest
				1033	physical memory space, or its flags may be modified. It may not be
				1034	resized. Slots may not overlap in guest physical address space.
				1035	Bits 0-15 of "slot" specifies the slot id and this value should be
				1036	less than the maximum number of user memory slots supported per VM.
				1037	The maximum allowed slots can be queried using KVM_CAP_NR_MEMSLOTS,
				1038	if this capability is supported by the architecture.
				1039
				1040	If KVM_CAP_MULTI_ADDRESS_SPACE is available, bits 16-31 of "slot"
				1041	specifies the address space which is being modified. They must be
				1042	less than the value that KVM_CHECK_EXTENSION returns for the
				1043	KVM_CAP_MULTI_ADDRESS_SPACE capability. Slots in separate address spaces
				1044	are unrelated; the restriction on overlapping slots only applies within
				1045	each address space.
				1046
				1047	Memory for the region is taken starting at the address denoted by the
				1048	field userspace_addr, which must point at user addressable memory for
				1049	the entire memory slot size. Any object may back this memory, including
				1050	anonymous memory, ordinary files, and hugetlbfs.
				1051
				1052	It is recommended that the lower 21 bits of guest_phys_addr and userspace_addr
				1053	be identical. This allows large pages in the guest to be backed by large
				1054	pages in the host.
				1055
				1056	The flags field supports two flags: KVM_MEM_LOG_DIRTY_PAGES and
				1057	KVM_MEM_READONLY. The former can be set to instruct KVM to keep track of
				1058	writes to memory within the slot. See KVM_GET_DIRTY_LOG ioctl to know how to
				1059	use it. The latter can be set, if KVM_CAP_READONLY_MEM capability allows it,
				1060	to make a new slot read-only. In this case, writes to this memory will be
				1061	posted to userspace as KVM_EXIT_MMIO exits.
				1062
				1063	When the KVM_CAP_SYNC_MMU capability is available, changes in the backing of
				1064	the memory region are automatically reflected into the guest. For example, an
				1065	mmap() that affects the region will be made visible immediately. Another
				1066	example is madvise(MADV_DROP).
				1067
				1068	It is recommended to use this API instead of the KVM_SET_MEMORY_REGION ioctl.
				1069	The KVM_SET_MEMORY_REGION does not allow fine grained control over memory
				1070	allocation and is deprecated.
				1071
				1072
				1073	4.36 KVM_SET_TSS_ADDR
				1074
				1075	Capability: KVM_CAP_SET_TSS_ADDR
				1076	Architectures: x86
				1077	Type: vm ioctl
				1078	Parameters: unsigned long tss_address (in)
				1079	Returns: 0 on success, -1 on error
				1080
				1081	This ioctl defines the physical address of a three-page region in the guest
				1082	physical address space. The region must be within the first 4GB of the
				1083	guest physical address space and must not conflict with any memory slot
				1084	or any mmio address. The guest may malfunction if it accesses this memory
				1085	region.
				1086
				1087	This ioctl is required on Intel-based hosts. This is needed on Intel hardware
				1088	because of a quirk in the virtualization implementation (see the internals
				1089	documentation when it pops into existence).
				1090
				1091
				1092	4.37 KVM_ENABLE_CAP
				1093
				1094	Capability: KVM_CAP_ENABLE_CAP, KVM_CAP_ENABLE_CAP_VM
				1095	Architectures: x86 (only KVM_CAP_ENABLE_CAP_VM),
				1096	mips (only KVM_CAP_ENABLE_CAP), ppc, s390
				1097	Type: vcpu ioctl, vm ioctl (with KVM_CAP_ENABLE_CAP_VM)
				1098	Parameters: struct kvm_enable_cap (in)
				1099	Returns: 0 on success; -1 on error
				1100
				1101	+Not all extensions are enabled by default. Using this ioctl the application
				1102	can enable an extension, making it available to the guest.
				1103
				1104	On systems that do not support this ioctl, it always fails. On systems that
				1105	do support it, it only works for extensions that are supported for enablement.
				1106
				1107	To check if a capability can be enabled, the KVM_CHECK_EXTENSION ioctl should
				1108	be used.
				1109
				1110	struct kvm_enable_cap {
				1111	/* in */
				1112	__u32 cap;
				1113
				1114	The capability that is supposed to get enabled.
				1115
				1116	__u32 flags;
				1117
				1118	A bitfield indicating future enhancements. Has to be 0 for now.
				1119
				1120	__u64 args[4];
				1121
				1122	Arguments for enabling a feature. If a feature needs initial values to
				1123	function properly, this is the place to put them.
				1124
				1125	__u8 pad[64];
				1126	};
				1127
				1128	The vcpu ioctl should be used for vcpu-specific capabilities, the vm ioctl
				1129	for vm-wide capabilities.
				1130
				1131	4.38 KVM_GET_MP_STATE
				1132
				1133	Capability: KVM_CAP_MP_STATE
				1134	Architectures: x86, s390, arm, arm64
				1135	Type: vcpu ioctl
				1136	Parameters: struct kvm_mp_state (out)
				1137	Returns: 0 on success; -1 on error
				1138
				1139	struct kvm_mp_state {
				1140	__u32 mp_state;
				1141	};
				1142
				1143	Returns the vcpu's current "multiprocessing state" (though also valid on
				1144	uniprocessor guests).
				1145
				1146	Possible values are:
				1147
				1148	- KVM_MP_STATE_RUNNABLE: the vcpu is currently running [x86,arm/arm64]
				1149	- KVM_MP_STATE_UNINITIALIZED: the vcpu is an application processor (AP)
				1150	which has not yet received an INIT signal [x86]
				1151	- KVM_MP_STATE_INIT_RECEIVED: the vcpu has received an INIT signal, and is
				1152	now ready for a SIPI [x86]
				1153	- KVM_MP_STATE_HALTED: the vcpu has executed a HLT instruction and
				1154	is waiting for an interrupt [x86]
				1155	- KVM_MP_STATE_SIPI_RECEIVED: the vcpu has just received a SIPI (vector
				1156	accessible via KVM_GET_VCPU_EVENTS) [x86]
				1157	- KVM_MP_STATE_STOPPED: the vcpu is stopped [s390,arm/arm64]
				1158	- KVM_MP_STATE_CHECK_STOP: the vcpu is in a special error state [s390]
				1159	- KVM_MP_STATE_OPERATING: the vcpu is operating (running or halted)
				1160	[s390]
				1161	- KVM_MP_STATE_LOAD: the vcpu is in a special load/startup state
				1162	[s390]
				1163
				1164	On x86, this ioctl is only useful after KVM_CREATE_IRQCHIP. Without an
				1165	in-kernel irqchip, the multiprocessing state must be maintained by userspace on
				1166	these architectures.
				1167
				1168	For arm/arm64:
				1169
				1170	The only states that are valid are KVM_MP_STATE_STOPPED and
				1171	KVM_MP_STATE_RUNNABLE which reflect if the vcpu is paused or not.
				1172
				1173	4.39 KVM_SET_MP_STATE
				1174
				1175	Capability: KVM_CAP_MP_STATE
				1176	Architectures: x86, s390, arm, arm64
				1177	Type: vcpu ioctl
				1178	Parameters: struct kvm_mp_state (in)
				1179	Returns: 0 on success; -1 on error
				1180
				1181	Sets the vcpu's current "multiprocessing state"; see KVM_GET_MP_STATE for
				1182	arguments.
				1183
				1184	On x86, this ioctl is only useful after KVM_CREATE_IRQCHIP. Without an
				1185	in-kernel irqchip, the multiprocessing state must be maintained by userspace on
				1186	these architectures.
				1187
				1188	For arm/arm64:
				1189
				1190	The only states that are valid are KVM_MP_STATE_STOPPED and
				1191	KVM_MP_STATE_RUNNABLE which reflect if the vcpu should be paused or not.
				1192
				1193	4.40 KVM_SET_IDENTITY_MAP_ADDR
				1194
				1195	Capability: KVM_CAP_SET_IDENTITY_MAP_ADDR
				1196	Architectures: x86
				1197	Type: vm ioctl
				1198	Parameters: unsigned long identity (in)
				1199	Returns: 0 on success, -1 on error
				1200
				1201	This ioctl defines the physical address of a one-page region in the guest
				1202	physical address space. The region must be within the first 4GB of the
				1203	guest physical address space and must not conflict with any memory slot
				1204	or any mmio address. The guest may malfunction if it accesses this memory
				1205	region.
				1206
				1207	Setting the address to 0 will result in resetting the address to its default
				1208	(0xfffbc000).
				1209
				1210	This ioctl is required on Intel-based hosts. This is needed on Intel hardware
				1211	because of a quirk in the virtualization implementation (see the internals
				1212	documentation when it pops into existence).
				1213
				1214	Fails if any VCPU has already been created.
				1215
				1216	4.41 KVM_SET_BOOT_CPU_ID
				1217
				1218	Capability: KVM_CAP_SET_BOOT_CPU_ID
				1219	Architectures: x86
				1220	Type: vm ioctl
				1221	Parameters: unsigned long vcpu_id
				1222	Returns: 0 on success, -1 on error
				1223
				1224	Define which vcpu is the Bootstrap Processor (BSP). Values are the same
				1225	as the vcpu id in KVM_CREATE_VCPU. If this ioctl is not called, the default
				1226	is vcpu 0.
				1227
				1228
				1229	4.42 KVM_GET_XSAVE
				1230
				1231	Capability: KVM_CAP_XSAVE
				1232	Architectures: x86
				1233	Type: vcpu ioctl
				1234	Parameters: struct kvm_xsave (out)
				1235	Returns: 0 on success, -1 on error
				1236
				1237	struct kvm_xsave {
				1238	__u32 region[1024];
				1239	};
				1240
				1241	This ioctl would copy current vcpu's xsave struct to the userspace.
				1242
				1243
				1244	4.43 KVM_SET_XSAVE
				1245
				1246	Capability: KVM_CAP_XSAVE
				1247	Architectures: x86
				1248	Type: vcpu ioctl
				1249	Parameters: struct kvm_xsave (in)
				1250	Returns: 0 on success, -1 on error
				1251
				1252	struct kvm_xsave {
				1253	__u32 region[1024];
				1254	};
				1255
				1256	This ioctl would copy userspace's xsave struct to the kernel.
				1257
				1258
				1259	4.44 KVM_GET_XCRS
				1260
				1261	Capability: KVM_CAP_XCRS
				1262	Architectures: x86
				1263	Type: vcpu ioctl
				1264	Parameters: struct kvm_xcrs (out)
				1265	Returns: 0 on success, -1 on error
				1266
				1267	struct kvm_xcr {
				1268	__u32 xcr;
				1269	__u32 reserved;
				1270	__u64 value;
				1271	};
				1272
				1273	struct kvm_xcrs {
				1274	__u32 nr_xcrs;
				1275	__u32 flags;
				1276	struct kvm_xcr xcrs[KVM_MAX_XCRS];
				1277	__u64 padding[16];
				1278	};
				1279
				1280	This ioctl would copy current vcpu's xcrs to the userspace.
				1281
				1282
				1283	4.45 KVM_SET_XCRS
				1284
				1285	Capability: KVM_CAP_XCRS
				1286	Architectures: x86
				1287	Type: vcpu ioctl
				1288	Parameters: struct kvm_xcrs (in)
				1289	Returns: 0 on success, -1 on error
				1290
				1291	struct kvm_xcr {
				1292	__u32 xcr;
				1293	__u32 reserved;
				1294	__u64 value;
				1295	};
				1296
				1297	struct kvm_xcrs {
				1298	__u32 nr_xcrs;
				1299	__u32 flags;
				1300	struct kvm_xcr xcrs[KVM_MAX_XCRS];
				1301	__u64 padding[16];
				1302	};
				1303
				1304	This ioctl would set vcpu's xcr to the value userspace specified.
				1305
				1306
				1307	4.46 KVM_GET_SUPPORTED_CPUID
				1308
				1309	Capability: KVM_CAP_EXT_CPUID
				1310	Architectures: x86
				1311	Type: system ioctl
				1312	Parameters: struct kvm_cpuid2 (in/out)
				1313	Returns: 0 on success, -1 on error
				1314
				1315	struct kvm_cpuid2 {
				1316	__u32 nent;
				1317	__u32 padding;
				1318	struct kvm_cpuid_entry2 entries[0];
				1319	};
				1320
				1321	#define KVM_CPUID_FLAG_SIGNIFCANT_INDEX BIT(0)
				1322	#define KVM_CPUID_FLAG_STATEFUL_FUNC BIT(1)
				1323	#define KVM_CPUID_FLAG_STATE_READ_NEXT BIT(2)
				1324
				1325	struct kvm_cpuid_entry2 {
				1326	__u32 function;
				1327	__u32 index;
				1328	__u32 flags;
				1329	__u32 eax;
				1330	__u32 ebx;
				1331	__u32 ecx;
				1332	__u32 edx;
				1333	__u32 padding[3];
				1334	};
				1335
				1336	This ioctl returns x86 cpuid features which are supported by both the
				1337	hardware and kvm in its default configuration. Userspace can use the
				1338	information returned by this ioctl to construct cpuid information (for
				1339	KVM_SET_CPUID2) that is consistent with hardware, kernel, and
				1340	userspace capabilities, and with user requirements (for example, the
				1341	user may wish to constrain cpuid to emulate older hardware, or for
				1342	feature consistency across a cluster).
				1343
				1344	Note that certain capabilities, such as KVM_CAP_X86_DISABLE_EXITS, may
				1345	expose cpuid features (e.g. MONITOR) which are not supported by kvm in
				1346	its default configuration. If userspace enables such capabilities, it
				1347	is responsible for modifying the results of this ioctl appropriately.
				1348
				1349	Userspace invokes KVM_GET_SUPPORTED_CPUID by passing a kvm_cpuid2 structure
				1350	with the 'nent' field indicating the number of entries in the variable-size
				1351	array 'entries'. If the number of entries is too low to describe the cpu
				1352	capabilities, an error (E2BIG) is returned. If the number is too high,
				1353	the 'nent' field is adjusted and an error (ENOMEM) is returned. If the
				1354	number is just right, the 'nent' field is adjusted to the number of valid
				1355	entries in the 'entries' array, which is then filled.
				1356
				1357	The entries returned are the host cpuid as returned by the cpuid instruction,
				1358	with unknown or unsupported features masked out. Some features (for example,
				1359	x2apic), may not be present in the host cpu, but are exposed by kvm if it can
				1360	emulate them efficiently. The fields in each entry are defined as follows:
				1361
				1362	function: the eax value used to obtain the entry
				1363	index: the ecx value used to obtain the entry (for entries that are
				1364	affected by ecx)
				1365	flags: an OR of zero or more of the following:
				1366	KVM_CPUID_FLAG_SIGNIFCANT_INDEX:
				1367	if the index field is valid
				1368	KVM_CPUID_FLAG_STATEFUL_FUNC:
				1369	if cpuid for this function returns different values for successive
				1370	invocations; there will be several entries with the same function,
				1371	all with this flag set
				1372	KVM_CPUID_FLAG_STATE_READ_NEXT:
				1373	for KVM_CPUID_FLAG_STATEFUL_FUNC entries, set if this entry is
				1374	the first entry to be read by a cpu
				1375	eax, ebx, ecx, edx: the values returned by the cpuid instruction for
				1376	this function/index combination
				1377
				1378	The TSC deadline timer feature (CPUID leaf 1, ecx[24]) is always returned
				1379	as false, since the feature depends on KVM_CREATE_IRQCHIP for local APIC
				1380	support. Instead it is reported via
				1381
				1382	ioctl(KVM_CHECK_EXTENSION, KVM_CAP_TSC_DEADLINE_TIMER)
				1383
				1384	if that returns true and you use KVM_CREATE_IRQCHIP, or if you emulate the
				1385	feature in userspace, then you can enable the feature for KVM_SET_CPUID2.
				1386
				1387
				1388	4.47 KVM_PPC_GET_PVINFO
				1389
				1390	Capability: KVM_CAP_PPC_GET_PVINFO
				1391	Architectures: ppc
				1392	Type: vm ioctl
				1393	Parameters: struct kvm_ppc_pvinfo (out)
				1394	Returns: 0 on success, !0 on error
				1395
				1396	struct kvm_ppc_pvinfo {
				1397	__u32 flags;
				1398	__u32 hcall[4];
				1399	__u8 pad[108];
				1400	};
				1401
				1402	This ioctl fetches PV specific information that need to be passed to the guest
				1403	using the device tree or other means from vm context.
				1404
				1405	The hcall array defines 4 instructions that make up a hypercall.
				1406
				1407	If any additional field gets added to this structure later on, a bit for that
				1408	additional piece of information will be set in the flags bitmap.
				1409
				1410	The flags bitmap is defined as:
				1411
				1412	/* the host supports the ePAPR idle hcall
				1413	#define KVM_PPC_PVINFO_FLAGS_EV_IDLE (1<<0)
				1414
				1415	4.52 KVM_SET_GSI_ROUTING
				1416
				1417	Capability: KVM_CAP_IRQ_ROUTING
				1418	Architectures: x86 s390 arm arm64
				1419	Type: vm ioctl
				1420	Parameters: struct kvm_irq_routing (in)
				1421	Returns: 0 on success, -1 on error
				1422
				1423	Sets the GSI routing table entries, overwriting any previously set entries.
				1424
				1425	On arm/arm64, GSI routing has the following limitation:
				1426	- GSI routing does not apply to KVM_IRQ_LINE but only to KVM_IRQFD.
				1427
				1428	struct kvm_irq_routing {
				1429	__u32 nr;
				1430	__u32 flags;
				1431	struct kvm_irq_routing_entry entries[0];
				1432	};
				1433
				1434	No flags are specified so far, the corresponding field must be set to zero.
				1435
				1436	struct kvm_irq_routing_entry {
				1437	__u32 gsi;
				1438	__u32 type;
				1439	__u32 flags;
				1440	__u32 pad;
				1441	union {
				1442	struct kvm_irq_routing_irqchip irqchip;
				1443	struct kvm_irq_routing_msi msi;
				1444	struct kvm_irq_routing_s390_adapter adapter;
				1445	struct kvm_irq_routing_hv_sint hv_sint;
				1446	__u32 pad[8];
				1447	} u;
				1448	};
				1449
				1450	/* gsi routing entry types */
				1451	#define KVM_IRQ_ROUTING_IRQCHIP 1
				1452	#define KVM_IRQ_ROUTING_MSI 2
				1453	#define KVM_IRQ_ROUTING_S390_ADAPTER 3
				1454	#define KVM_IRQ_ROUTING_HV_SINT 4
				1455
				1456	flags:
				1457	- KVM_MSI_VALID_DEVID: used along with KVM_IRQ_ROUTING_MSI routing entry
				1458	type, specifies that the devid field contains a valid value. The per-VM
				1459	KVM_CAP_MSI_DEVID capability advertises the requirement to provide
				1460	the device ID. If this capability is not available, userspace should
				1461	never set the KVM_MSI_VALID_DEVID flag as the ioctl might fail.
				1462	- zero otherwise
				1463
				1464	struct kvm_irq_routing_irqchip {
				1465	__u32 irqchip;
				1466	__u32 pin;
				1467	};
				1468
				1469	struct kvm_irq_routing_msi {
				1470	__u32 address_lo;
				1471	__u32 address_hi;
				1472	__u32 data;
				1473	union {
				1474	__u32 pad;
				1475	__u32 devid;
				1476	};
				1477	};
				1478
				1479	If KVM_MSI_VALID_DEVID is set, devid contains a unique device identifier
				1480	for the device that wrote the MSI message. For PCI, this is usually a
				1481	BFD identifier in the lower 16 bits.
				1482
				1483	On x86, address_hi is ignored unless the KVM_X2APIC_API_USE_32BIT_IDS
				1484	feature of KVM_CAP_X2APIC_API capability is enabled. If it is enabled,
				1485	address_hi bits 31-8 provide bits 31-8 of the destination id. Bits 7-0 of
				1486	address_hi must be zero.
				1487
				1488	struct kvm_irq_routing_s390_adapter {
				1489	__u64 ind_addr;
				1490	__u64 summary_addr;
				1491	__u64 ind_offset;
				1492	__u32 summary_offset;
				1493	__u32 adapter_id;
				1494	};
				1495
				1496	struct kvm_irq_routing_hv_sint {
				1497	__u32 vcpu;
				1498	__u32 sint;
				1499	};
				1500
				1501
				1502	4.55 KVM_SET_TSC_KHZ
				1503
				1504	Capability: KVM_CAP_TSC_CONTROL
				1505	Architectures: x86
				1506	Type: vcpu ioctl
				1507	Parameters: virtual tsc_khz
				1508	Returns: 0 on success, -1 on error
				1509
				1510	Specifies the tsc frequency for the virtual machine. The unit of the
				1511	frequency is KHz.
				1512
				1513
				1514	4.56 KVM_GET_TSC_KHZ
				1515
				1516	Capability: KVM_CAP_GET_TSC_KHZ
				1517	Architectures: x86
				1518	Type: vcpu ioctl
				1519	Parameters: none
				1520	Returns: virtual tsc-khz on success, negative value on error
				1521
				1522	Returns the tsc frequency of the guest. The unit of the return value is
				1523	KHz. If the host has unstable tsc this ioctl returns -EIO instead as an
				1524	error.
				1525
				1526
				1527	4.57 KVM_GET_LAPIC
				1528
				1529	Capability: KVM_CAP_IRQCHIP
				1530	Architectures: x86
				1531	Type: vcpu ioctl
				1532	Parameters: struct kvm_lapic_state (out)
				1533	Returns: 0 on success, -1 on error
				1534
				1535	#define KVM_APIC_REG_SIZE 0x400
				1536	struct kvm_lapic_state {
				1537	char regs[KVM_APIC_REG_SIZE];
				1538	};
				1539
				1540	Reads the Local APIC registers and copies them into the input argument. The
				1541	data format and layout are the same as documented in the architecture manual.
				1542
				1543	If KVM_X2APIC_API_USE_32BIT_IDS feature of KVM_CAP_X2APIC_API is
				1544	enabled, then the format of APIC_ID register depends on the APIC mode
				1545	(reported by MSR_IA32_APICBASE) of its VCPU. x2APIC stores APIC ID in
				1546	the APIC_ID register (bytes 32-35). xAPIC only allows an 8-bit APIC ID
				1547	which is stored in bits 31-24 of the APIC register, or equivalently in
				1548	byte 35 of struct kvm_lapic_state's regs field. KVM_GET_LAPIC must then
				1549	be called after MSR_IA32_APICBASE has been set with KVM_SET_MSR.
				1550
				1551	If KVM_X2APIC_API_USE_32BIT_IDS feature is disabled, struct kvm_lapic_state
				1552	always uses xAPIC format.
				1553
				1554
				1555	4.58 KVM_SET_LAPIC
				1556
				1557	Capability: KVM_CAP_IRQCHIP
				1558	Architectures: x86
				1559	Type: vcpu ioctl
				1560	Parameters: struct kvm_lapic_state (in)
				1561	Returns: 0 on success, -1 on error
				1562
				1563	#define KVM_APIC_REG_SIZE 0x400
				1564	struct kvm_lapic_state {
				1565	char regs[KVM_APIC_REG_SIZE];
				1566	};
				1567
				1568	Copies the input argument into the Local APIC registers. The data format
				1569	and layout are the same as documented in the architecture manual.
				1570
				1571	The format of the APIC ID register (bytes 32-35 of struct kvm_lapic_state's
				1572	regs field) depends on the state of the KVM_CAP_X2APIC_API capability.
				1573	See the note in KVM_GET_LAPIC.
				1574
				1575
				1576	4.59 KVM_IOEVENTFD
				1577
				1578	Capability: KVM_CAP_IOEVENTFD
				1579	Architectures: all
				1580	Type: vm ioctl
				1581	Parameters: struct kvm_ioeventfd (in)
				1582	Returns: 0 on success, !0 on error
				1583
				1584	This ioctl attaches or detaches an ioeventfd to a legal pio/mmio address
				1585	within the guest. A guest write in the registered address will signal the
				1586	provided event instead of triggering an exit.
				1587
				1588	struct kvm_ioeventfd {
				1589	__u64 datamatch;
				1590	__u64 addr; /* legal pio/mmio address */
				1591	__u32 len; /* 0, 1, 2, 4, or 8 bytes */
				1592	__s32 fd;
				1593	__u32 flags;
				1594	__u8 pad[36];
				1595	};
				1596
				1597	For the special case of virtio-ccw devices on s390, the ioevent is matched
				1598	to a subchannel/virtqueue tuple instead.
				1599
				1600	The following flags are defined:
				1601
				1602	#define KVM_IOEVENTFD_FLAG_DATAMATCH (1 << kvm_ioeventfd_flag_nr_datamatch)
				1603	#define KVM_IOEVENTFD_FLAG_PIO (1 << kvm_ioeventfd_flag_nr_pio)
				1604	#define KVM_IOEVENTFD_FLAG_DEASSIGN (1 << kvm_ioeventfd_flag_nr_deassign)
				1605	#define KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY \
				1606	(1 << kvm_ioeventfd_flag_nr_virtio_ccw_notify)
				1607
				1608	If datamatch flag is set, the event will be signaled only if the written value
				1609	to the registered address is equal to datamatch in struct kvm_ioeventfd.
				1610
				1611	For virtio-ccw devices, addr contains the subchannel id and datamatch the
				1612	virtqueue index.
				1613
				1614	With KVM_CAP_IOEVENTFD_ANY_LENGTH, a zero length ioeventfd is allowed, and
				1615	the kernel will ignore the length of guest write and may get a faster vmexit.
				1616	The speedup may only apply to specific architectures, but the ioeventfd will
				1617	work anyway.
				1618
				1619	4.60 KVM_DIRTY_TLB
				1620
				1621	Capability: KVM_CAP_SW_TLB
				1622	Architectures: ppc
				1623	Type: vcpu ioctl
				1624	Parameters: struct kvm_dirty_tlb (in)
				1625	Returns: 0 on success, -1 on error
				1626
				1627	struct kvm_dirty_tlb {
				1628	__u64 bitmap;
				1629	__u32 num_dirty;
				1630	};
				1631
				1632	This must be called whenever userspace has changed an entry in the shared
				1633	TLB, prior to calling KVM_RUN on the associated vcpu.
				1634
				1635	The "bitmap" field is the userspace address of an array. This array
				1636	consists of a number of bits, equal to the total number of TLB entries as
				1637	determined by the last successful call to KVM_CONFIG_TLB, rounded up to the
				1638	nearest multiple of 64.
				1639
				1640	Each bit corresponds to one TLB entry, ordered the same as in the shared TLB
				1641	array.
				1642
				1643	The array is little-endian: the bit 0 is the least significant bit of the
				1644	first byte, bit 8 is the least significant bit of the second byte, etc.
				1645	This avoids any complications with differing word sizes.
				1646
				1647	The "num_dirty" field is a performance hint for KVM to determine whether it
				1648	should skip processing the bitmap and just invalidate everything. It must
				1649	be set to the number of set bits in the bitmap.
				1650
				1651
				1652	4.62 KVM_CREATE_SPAPR_TCE
				1653
				1654	Capability: KVM_CAP_SPAPR_TCE
				1655	Architectures: powerpc
				1656	Type: vm ioctl
				1657	Parameters: struct kvm_create_spapr_tce (in)
				1658	Returns: file descriptor for manipulating the created TCE table
				1659
				1660	This creates a virtual TCE (translation control entry) table, which
				1661	is an IOMMU for PAPR-style virtual I/O. It is used to translate
				1662	logical addresses used in virtual I/O into guest physical addresses,
				1663	and provides a scatter/gather capability for PAPR virtual I/O.
				1664
				1665	/* for KVM_CAP_SPAPR_TCE */
				1666	struct kvm_create_spapr_tce {
				1667	__u64 liobn;
				1668	__u32 window_size;
				1669	};
				1670
				1671	The liobn field gives the logical IO bus number for which to create a
				1672	TCE table. The window_size field specifies the size of the DMA window
				1673	which this TCE table will translate - the table will contain one 64
				1674	bit TCE entry for every 4kiB of the DMA window.
				1675
				1676	When the guest issues an H_PUT_TCE hcall on a liobn for which a TCE
				1677	table has been created using this ioctl(), the kernel will handle it
				1678	in real mode, updating the TCE table. H_PUT_TCE calls for other
				1679	liobns will cause a vm exit and must be handled by userspace.
				1680
				1681	The return value is a file descriptor which can be passed to mmap(2)
				1682	to map the created TCE table into userspace. This lets userspace read
				1683	the entries written by kernel-handled H_PUT_TCE calls, and also lets
				1684	userspace update the TCE table directly which is useful in some
				1685	circumstances.
				1686
				1687
				1688	4.63 KVM_ALLOCATE_RMA
				1689
				1690	Capability: KVM_CAP_PPC_RMA
				1691	Architectures: powerpc
				1692	Type: vm ioctl
				1693	Parameters: struct kvm_allocate_rma (out)
				1694	Returns: file descriptor for mapping the allocated RMA
				1695
				1696	This allocates a Real Mode Area (RMA) from the pool allocated at boot
				1697	time by the kernel. An RMA is a physically-contiguous, aligned region
				1698	of memory used on older POWER processors to provide the memory which
				1699	will be accessed by real-mode (MMU off) accesses in a KVM guest.
				1700	POWER processors support a set of sizes for the RMA that usually
				1701	includes 64MB, 128MB, 256MB and some larger powers of two.
				1702
				1703	/* for KVM_ALLOCATE_RMA */
				1704	struct kvm_allocate_rma {
				1705	__u64 rma_size;
				1706	};
				1707
				1708	The return value is a file descriptor which can be passed to mmap(2)
				1709	to map the allocated RMA into userspace. The mapped area can then be
				1710	passed to the KVM_SET_USER_MEMORY_REGION ioctl to establish it as the
				1711	RMA for a virtual machine. The size of the RMA in bytes (which is
				1712	fixed at host kernel boot time) is returned in the rma_size field of
				1713	the argument structure.
				1714
				1715	The KVM_CAP_PPC_RMA capability is 1 or 2 if the KVM_ALLOCATE_RMA ioctl
				1716	is supported; 2 if the processor requires all virtual machines to have
				1717	an RMA, or 1 if the processor can use an RMA but doesn't require it,
				1718	because it supports the Virtual RMA (VRMA) facility.
				1719
				1720
				1721	4.64 KVM_NMI
				1722
				1723	Capability: KVM_CAP_USER_NMI
				1724	Architectures: x86
				1725	Type: vcpu ioctl
				1726	Parameters: none
				1727	Returns: 0 on success, -1 on error
				1728
				1729	Queues an NMI on the thread's vcpu. Note this is well defined only
				1730	when KVM_CREATE_IRQCHIP has not been called, since this is an interface
				1731	between the virtual cpu core and virtual local APIC. After KVM_CREATE_IRQCHIP
				1732	has been called, this interface is completely emulated within the kernel.
				1733
				1734	To use this to emulate the LINT1 input with KVM_CREATE_IRQCHIP, use the
				1735	following algorithm:
				1736
				1737	- pause the vcpu
				1738	- read the local APIC's state (KVM_GET_LAPIC)
				1739	- check whether changing LINT1 will queue an NMI (see the LVT entry for LINT1)
				1740	- if so, issue KVM_NMI
				1741	- resume the vcpu
				1742
				1743	Some guests configure the LINT1 NMI input to cause a panic, aiding in
				1744	debugging.
				1745
				1746
				1747	4.65 KVM_S390_UCAS_MAP
				1748
				1749	Capability: KVM_CAP_S390_UCONTROL
				1750	Architectures: s390
				1751	Type: vcpu ioctl
				1752	Parameters: struct kvm_s390_ucas_mapping (in)
				1753	Returns: 0 in case of success
				1754
				1755	The parameter is defined like this:
				1756	struct kvm_s390_ucas_mapping {
				1757	__u64 user_addr;
				1758	__u64 vcpu_addr;
				1759	__u64 length;
				1760	};
				1761
				1762	This ioctl maps the memory at "user_addr" with the length "length" to
				1763	the vcpu's address space starting at "vcpu_addr". All parameters need to
				1764	be aligned by 1 megabyte.
				1765
				1766
				1767	4.66 KVM_S390_UCAS_UNMAP
				1768
				1769	Capability: KVM_CAP_S390_UCONTROL
				1770	Architectures: s390
				1771	Type: vcpu ioctl
				1772	Parameters: struct kvm_s390_ucas_mapping (in)
				1773	Returns: 0 in case of success
				1774
				1775	The parameter is defined like this:
				1776	struct kvm_s390_ucas_mapping {
				1777	__u64 user_addr;
				1778	__u64 vcpu_addr;
				1779	__u64 length;
				1780	};
				1781
				1782	This ioctl unmaps the memory in the vcpu's address space starting at
				1783	"vcpu_addr" with the length "length". The field "user_addr" is ignored.
				1784	All parameters need to be aligned by 1 megabyte.
				1785
				1786
				1787	4.67 KVM_S390_VCPU_FAULT
				1788
				1789	Capability: KVM_CAP_S390_UCONTROL
				1790	Architectures: s390
				1791	Type: vcpu ioctl
				1792	Parameters: vcpu absolute address (in)
				1793	Returns: 0 in case of success
				1794
				1795	This call creates a page table entry on the virtual cpu's address space
				1796	(for user controlled virtual machines) or the virtual machine's address
				1797	space (for regular virtual machines). This only works for minor faults,
				1798	thus it's recommended to access subject memory page via the user page
				1799	table upfront. This is useful to handle validity intercepts for user
				1800	controlled virtual machines to fault in the virtual cpu's lowcore pages
				1801	prior to calling the KVM_RUN ioctl.
				1802
				1803
				1804	4.68 KVM_SET_ONE_REG
				1805
				1806	Capability: KVM_CAP_ONE_REG
				1807	Architectures: all
				1808	Type: vcpu ioctl
				1809	Parameters: struct kvm_one_reg (in)
				1810	Returns: 0 on success, negative value on failure
				1811
				1812	struct kvm_one_reg {
				1813	__u64 id;
				1814	__u64 addr;
				1815	};
				1816
				1817	Using this ioctl, a single vcpu register can be set to a specific value
				1818	defined by user space with the passed in struct kvm_one_reg, where id
				1819	refers to the register identifier as described below and addr is a pointer
				1820	to a variable with the respective size. There can be architecture agnostic
				1821	and architecture specific registers. Each have their own range of operation
				1822	and their own constants and width. To keep track of the implemented
				1823	registers, find a list below:
				1824
				1825	Arch \| Register \| Width (bits)
				1826	\| \|
				1827	PPC \| KVM_REG_PPC_HIOR \| 64
				1828	PPC \| KVM_REG_PPC_IAC1 \| 64
				1829	PPC \| KVM_REG_PPC_IAC2 \| 64
				1830	PPC \| KVM_REG_PPC_IAC3 \| 64
				1831	PPC \| KVM_REG_PPC_IAC4 \| 64
				1832	PPC \| KVM_REG_PPC_DAC1 \| 64
				1833	PPC \| KVM_REG_PPC_DAC2 \| 64
				1834	PPC \| KVM_REG_PPC_DABR \| 64
				1835	PPC \| KVM_REG_PPC_DSCR \| 64
				1836	PPC \| KVM_REG_PPC_PURR \| 64
				1837	PPC \| KVM_REG_PPC_SPURR \| 64
				1838	PPC \| KVM_REG_PPC_DAR \| 64
				1839	PPC \| KVM_REG_PPC_DSISR \| 32
				1840	PPC \| KVM_REG_PPC_AMR \| 64
				1841	PPC \| KVM_REG_PPC_UAMOR \| 64
				1842	PPC \| KVM_REG_PPC_MMCR0 \| 64
				1843	PPC \| KVM_REG_PPC_MMCR1 \| 64
				1844	PPC \| KVM_REG_PPC_MMCRA \| 64
				1845	PPC \| KVM_REG_PPC_MMCR2 \| 64
				1846	PPC \| KVM_REG_PPC_MMCRS \| 64
				1847	PPC \| KVM_REG_PPC_SIAR \| 64
				1848	PPC \| KVM_REG_PPC_SDAR \| 64
				1849	PPC \| KVM_REG_PPC_SIER \| 64
				1850	PPC \| KVM_REG_PPC_PMC1 \| 32
				1851	PPC \| KVM_REG_PPC_PMC2 \| 32
				1852	PPC \| KVM_REG_PPC_PMC3 \| 32
				1853	PPC \| KVM_REG_PPC_PMC4 \| 32
				1854	PPC \| KVM_REG_PPC_PMC5 \| 32
				1855	PPC \| KVM_REG_PPC_PMC6 \| 32
				1856	PPC \| KVM_REG_PPC_PMC7 \| 32
				1857	PPC \| KVM_REG_PPC_PMC8 \| 32
				1858	PPC \| KVM_REG_PPC_FPR0 \| 64
				1859	...
				1860	PPC \| KVM_REG_PPC_FPR31 \| 64
				1861	PPC \| KVM_REG_PPC_VR0 \| 128
				1862	...
				1863	PPC \| KVM_REG_PPC_VR31 \| 128
				1864	PPC \| KVM_REG_PPC_VSR0 \| 128
				1865	...
				1866	PPC \| KVM_REG_PPC_VSR31 \| 128
				1867	PPC \| KVM_REG_PPC_FPSCR \| 64
				1868	PPC \| KVM_REG_PPC_VSCR \| 32
				1869	PPC \| KVM_REG_PPC_VPA_ADDR \| 64
				1870	PPC \| KVM_REG_PPC_VPA_SLB \| 128
				1871	PPC \| KVM_REG_PPC_VPA_DTL \| 128
				1872	PPC \| KVM_REG_PPC_EPCR \| 32
				1873	PPC \| KVM_REG_PPC_EPR \| 32
				1874	PPC \| KVM_REG_PPC_TCR \| 32
				1875	PPC \| KVM_REG_PPC_TSR \| 32
				1876	PPC \| KVM_REG_PPC_OR_TSR \| 32
				1877	PPC \| KVM_REG_PPC_CLEAR_TSR \| 32
				1878	PPC \| KVM_REG_PPC_MAS0 \| 32
				1879	PPC \| KVM_REG_PPC_MAS1 \| 32
				1880	PPC \| KVM_REG_PPC_MAS2 \| 64
				1881	PPC \| KVM_REG_PPC_MAS7_3 \| 64
				1882	PPC \| KVM_REG_PPC_MAS4 \| 32
				1883	PPC \| KVM_REG_PPC_MAS6 \| 32
				1884	PPC \| KVM_REG_PPC_MMUCFG \| 32
				1885	PPC \| KVM_REG_PPC_TLB0CFG \| 32
				1886	PPC \| KVM_REG_PPC_TLB1CFG \| 32
				1887	PPC \| KVM_REG_PPC_TLB2CFG \| 32
				1888	PPC \| KVM_REG_PPC_TLB3CFG \| 32
				1889	PPC \| KVM_REG_PPC_TLB0PS \| 32
				1890	PPC \| KVM_REG_PPC_TLB1PS \| 32
				1891	PPC \| KVM_REG_PPC_TLB2PS \| 32
				1892	PPC \| KVM_REG_PPC_TLB3PS \| 32
				1893	PPC \| KVM_REG_PPC_EPTCFG \| 32
				1894	PPC \| KVM_REG_PPC_ICP_STATE \| 64
				1895	PPC \| KVM_REG_PPC_TB_OFFSET \| 64
				1896	PPC \| KVM_REG_PPC_SPMC1 \| 32
				1897	PPC \| KVM_REG_PPC_SPMC2 \| 32
				1898	PPC \| KVM_REG_PPC_IAMR \| 64
				1899	PPC \| KVM_REG_PPC_TFHAR \| 64
				1900	PPC \| KVM_REG_PPC_TFIAR \| 64
				1901	PPC \| KVM_REG_PPC_TEXASR \| 64
				1902	PPC \| KVM_REG_PPC_FSCR \| 64
				1903	PPC \| KVM_REG_PPC_PSPB \| 32
				1904	PPC \| KVM_REG_PPC_EBBHR \| 64
				1905	PPC \| KVM_REG_PPC_EBBRR \| 64
				1906	PPC \| KVM_REG_PPC_BESCR \| 64
				1907	PPC \| KVM_REG_PPC_TAR \| 64
				1908	PPC \| KVM_REG_PPC_DPDES \| 64
				1909	PPC \| KVM_REG_PPC_DAWR \| 64
				1910	PPC \| KVM_REG_PPC_DAWRX \| 64
				1911	PPC \| KVM_REG_PPC_CIABR \| 64
				1912	PPC \| KVM_REG_PPC_IC \| 64
				1913	PPC \| KVM_REG_PPC_VTB \| 64
				1914	PPC \| KVM_REG_PPC_CSIGR \| 64
				1915	PPC \| KVM_REG_PPC_TACR \| 64
				1916	PPC \| KVM_REG_PPC_TCSCR \| 64
				1917	PPC \| KVM_REG_PPC_PID \| 64
				1918	PPC \| KVM_REG_PPC_ACOP \| 64
				1919	PPC \| KVM_REG_PPC_VRSAVE \| 32
				1920	PPC \| KVM_REG_PPC_LPCR \| 32
				1921	PPC \| KVM_REG_PPC_LPCR_64 \| 64
				1922	PPC \| KVM_REG_PPC_PPR \| 64
				1923	PPC \| KVM_REG_PPC_ARCH_COMPAT \| 32
				1924	PPC \| KVM_REG_PPC_DABRX \| 32
				1925	PPC \| KVM_REG_PPC_WORT \| 64
				1926	PPC \| KVM_REG_PPC_SPRG9 \| 64
				1927	PPC \| KVM_REG_PPC_DBSR \| 32
				1928	PPC \| KVM_REG_PPC_TIDR \| 64
				1929	PPC \| KVM_REG_PPC_PSSCR \| 64
				1930	PPC \| KVM_REG_PPC_DEC_EXPIRY \| 64
				1931	PPC \| KVM_REG_PPC_TM_GPR0 \| 64
				1932	...
				1933	PPC \| KVM_REG_PPC_TM_GPR31 \| 64
				1934	PPC \| KVM_REG_PPC_TM_VSR0 \| 128
				1935	...
				1936	PPC \| KVM_REG_PPC_TM_VSR63 \| 128
				1937	PPC \| KVM_REG_PPC_TM_CR \| 64
				1938	PPC \| KVM_REG_PPC_TM_LR \| 64
				1939	PPC \| KVM_REG_PPC_TM_CTR \| 64
				1940	PPC \| KVM_REG_PPC_TM_FPSCR \| 64
				1941	PPC \| KVM_REG_PPC_TM_AMR \| 64
				1942	PPC \| KVM_REG_PPC_TM_PPR \| 64
				1943	PPC \| KVM_REG_PPC_TM_VRSAVE \| 64
				1944	PPC \| KVM_REG_PPC_TM_VSCR \| 32
				1945	PPC \| KVM_REG_PPC_TM_DSCR \| 64
				1946	PPC \| KVM_REG_PPC_TM_TAR \| 64
				1947	PPC \| KVM_REG_PPC_TM_XER \| 64
				1948	\| \|
				1949	MIPS \| KVM_REG_MIPS_R0 \| 64
				1950	...
				1951	MIPS \| KVM_REG_MIPS_R31 \| 64
				1952	MIPS \| KVM_REG_MIPS_HI \| 64
				1953	MIPS \| KVM_REG_MIPS_LO \| 64
				1954	MIPS \| KVM_REG_MIPS_PC \| 64
				1955	MIPS \| KVM_REG_MIPS_CP0_INDEX \| 32
				1956	MIPS \| KVM_REG_MIPS_CP0_ENTRYLO0 \| 64
				1957	MIPS \| KVM_REG_MIPS_CP0_ENTRYLO1 \| 64
				1958	MIPS \| KVM_REG_MIPS_CP0_CONTEXT \| 64
				1959	MIPS \| KVM_REG_MIPS_CP0_CONTEXTCONFIG\| 32
				1960	MIPS \| KVM_REG_MIPS_CP0_USERLOCAL \| 64
				1961	MIPS \| KVM_REG_MIPS_CP0_XCONTEXTCONFIG\| 64
				1962	MIPS \| KVM_REG_MIPS_CP0_PAGEMASK \| 32
				1963	MIPS \| KVM_REG_MIPS_CP0_PAGEGRAIN \| 32
				1964	MIPS \| KVM_REG_MIPS_CP0_SEGCTL0 \| 64
				1965	MIPS \| KVM_REG_MIPS_CP0_SEGCTL1 \| 64
				1966	MIPS \| KVM_REG_MIPS_CP0_SEGCTL2 \| 64
				1967	MIPS \| KVM_REG_MIPS_CP0_PWBASE \| 64
				1968	MIPS \| KVM_REG_MIPS_CP0_PWFIELD \| 64
				1969	MIPS \| KVM_REG_MIPS_CP0_PWSIZE \| 64
				1970	MIPS \| KVM_REG_MIPS_CP0_WIRED \| 32
				1971	MIPS \| KVM_REG_MIPS_CP0_PWCTL \| 32
				1972	MIPS \| KVM_REG_MIPS_CP0_HWRENA \| 32
				1973	MIPS \| KVM_REG_MIPS_CP0_BADVADDR \| 64
				1974	MIPS \| KVM_REG_MIPS_CP0_BADINSTR \| 32
				1975	MIPS \| KVM_REG_MIPS_CP0_BADINSTRP \| 32
				1976	MIPS \| KVM_REG_MIPS_CP0_COUNT \| 32
				1977	MIPS \| KVM_REG_MIPS_CP0_ENTRYHI \| 64
				1978	MIPS \| KVM_REG_MIPS_CP0_COMPARE \| 32
				1979	MIPS \| KVM_REG_MIPS_CP0_STATUS \| 32
				1980	MIPS \| KVM_REG_MIPS_CP0_INTCTL \| 32
				1981	MIPS \| KVM_REG_MIPS_CP0_CAUSE \| 32
				1982	MIPS \| KVM_REG_MIPS_CP0_EPC \| 64
				1983	MIPS \| KVM_REG_MIPS_CP0_PRID \| 32
				1984	MIPS \| KVM_REG_MIPS_CP0_EBASE \| 64
				1985	MIPS \| KVM_REG_MIPS_CP0_CONFIG \| 32
				1986	MIPS \| KVM_REG_MIPS_CP0_CONFIG1 \| 32
				1987	MIPS \| KVM_REG_MIPS_CP0_CONFIG2 \| 32
				1988	MIPS \| KVM_REG_MIPS_CP0_CONFIG3 \| 32
				1989	MIPS \| KVM_REG_MIPS_CP0_CONFIG4 \| 32
				1990	MIPS \| KVM_REG_MIPS_CP0_CONFIG5 \| 32
				1991	MIPS \| KVM_REG_MIPS_CP0_CONFIG7 \| 32
				1992	MIPS \| KVM_REG_MIPS_CP0_XCONTEXT \| 64
				1993	MIPS \| KVM_REG_MIPS_CP0_ERROREPC \| 64
				1994	MIPS \| KVM_REG_MIPS_CP0_KSCRATCH1 \| 64
				1995	MIPS \| KVM_REG_MIPS_CP0_KSCRATCH2 \| 64
				1996	MIPS \| KVM_REG_MIPS_CP0_KSCRATCH3 \| 64
				1997	MIPS \| KVM_REG_MIPS_CP0_KSCRATCH4 \| 64
				1998	MIPS \| KVM_REG_MIPS_CP0_KSCRATCH5 \| 64
				1999	MIPS \| KVM_REG_MIPS_CP0_KSCRATCH6 \| 64
				2000	MIPS \| KVM_REG_MIPS_CP0_MAAR(0..63) \| 64
				2001	MIPS \| KVM_REG_MIPS_COUNT_CTL \| 64
				2002	MIPS \| KVM_REG_MIPS_COUNT_RESUME \| 64
				2003	MIPS \| KVM_REG_MIPS_COUNT_HZ \| 64
				2004	MIPS \| KVM_REG_MIPS_FPR_32(0..31) \| 32
				2005	MIPS \| KVM_REG_MIPS_FPR_64(0..31) \| 64
				2006	MIPS \| KVM_REG_MIPS_VEC_128(0..31) \| 128
				2007	MIPS \| KVM_REG_MIPS_FCR_IR \| 32
				2008	MIPS \| KVM_REG_MIPS_FCR_CSR \| 32
				2009	MIPS \| KVM_REG_MIPS_MSA_IR \| 32
				2010	MIPS \| KVM_REG_MIPS_MSA_CSR \| 32
				2011
				2012	ARM registers are mapped using the lower 32 bits. The upper 16 of that
				2013	is the register group type, or coprocessor number:
				2014
				2015	ARM core registers have the following id bit patterns:
				2016	0x4020 0000 0010 <index into the kvm_regs struct:16>
				2017
				2018	ARM 32-bit CP15 registers have the following id bit patterns:
				2019	0x4020 0000 000F <zero:1> <crn:4> <crm:4> <opc1:4> <opc2:3>
				2020
				2021	ARM 64-bit CP15 registers have the following id bit patterns:
				2022	0x4030 0000 000F <zero:1> <zero:4> <crm:4> <opc1:4> <zero:3>
				2023
				2024	ARM CCSIDR registers are demultiplexed by CSSELR value:
				2025	0x4020 0000 0011 00 <csselr:8>
				2026
				2027	ARM 32-bit VFP control registers have the following id bit patterns:
				2028	0x4020 0000 0012 1 <regno:12>
				2029
				2030	ARM 64-bit FP registers have the following id bit patterns:
				2031	0x4030 0000 0012 0 <regno:12>
				2032
				2033	ARM firmware pseudo-registers have the following bit pattern:
				2034	0x4030 0000 0014 <regno:16>
				2035
				2036
				2037	arm64 registers are mapped using the lower 32 bits. The upper 16 of
				2038	that is the register group type, or coprocessor number:
				2039
				2040	arm64 core/FP-SIMD registers have the following id bit patterns. Note
				2041	that the size of the access is variable, as the kvm_regs structure
				2042	contains elements ranging from 32 to 128 bits. The index is a 32bit
				2043	value in the kvm_regs structure seen as a 32bit array.
				2044	0x60x0 0000 0010 <index into the kvm_regs struct:16>
				2045
				2046	arm64 CCSIDR registers are demultiplexed by CSSELR value:
				2047	0x6020 0000 0011 00 <csselr:8>
				2048
				2049	arm64 system registers have the following id bit patterns:
				2050	0x6030 0000 0013 <op0:2> <op1:3> <crn:4> <crm:4> <op2:3>
				2051
				2052	arm64 firmware pseudo-registers have the following bit pattern:
				2053	0x6030 0000 0014 <regno:16>
				2054
				2055
				2056	MIPS registers are mapped using the lower 32 bits. The upper 16 of that is
				2057	the register group type:
				2058
				2059	MIPS core registers (see above) have the following id bit patterns:
				2060	0x7030 0000 0000 <reg:16>
				2061
				2062	MIPS CP0 registers (see KVM_REG_MIPS_CP0_* above) have the following id bit
				2063	patterns depending on whether they're 32-bit or 64-bit registers:
				2064	0x7020 0000 0001 00 <reg:5> <sel:3> (32-bit)
				2065	0x7030 0000 0001 00 <reg:5> <sel:3> (64-bit)
				2066
				2067	Note: KVM_REG_MIPS_CP0_ENTRYLO0 and KVM_REG_MIPS_CP0_ENTRYLO1 are the MIPS64
				2068	versions of the EntryLo registers regardless of the word size of the host
				2069	hardware, host kernel, guest, and whether XPA is present in the guest, i.e.
				2070	with the RI and XI bits (if they exist) in bits 63 and 62 respectively, and
				2071	the PFNX field starting at bit 30.
				2072
				2073	MIPS MAARs (see KVM_REG_MIPS_CP0_MAAR(*) above) have the following id bit
				2074	patterns:
				2075	0x7030 0000 0001 01 <reg:8>
				2076
				2077	MIPS KVM control registers (see above) have the following id bit patterns:
				2078	0x7030 0000 0002 <reg:16>
				2079
				2080	MIPS FPU registers (see KVM_REG_MIPS_FPR_{32,64}() above) have the following
				2081	id bit patterns depending on the size of the register being accessed. They are
				2082	always accessed according to the current guest FPU mode (Status.FR and
				2083	Config5.FRE), i.e. as the guest would see them, and they become unpredictable
				2084	if the guest FPU mode is changed. MIPS SIMD Architecture (MSA) vector
				2085	registers (see KVM_REG_MIPS_VEC_128() above) have similar patterns as they
				2086	overlap the FPU registers:
				2087	0x7020 0000 0003 00 <0:3> <reg:5> (32-bit FPU registers)
				2088	0x7030 0000 0003 00 <0:3> <reg:5> (64-bit FPU registers)
				2089	0x7040 0000 0003 00 <0:3> <reg:5> (128-bit MSA vector registers)
				2090
				2091	MIPS FPU control registers (see KVM_REG_MIPS_FCR_{IR,CSR} above) have the
				2092	following id bit patterns:
				2093	0x7020 0000 0003 01 <0:3> <reg:5>
				2094
				2095	MIPS MSA control registers (see KVM_REG_MIPS_MSA_{IR,CSR} above) have the
				2096	following id bit patterns:
				2097	0x7020 0000 0003 02 <0:3> <reg:5>
				2098
				2099
				2100	4.69 KVM_GET_ONE_REG
				2101
				2102	Capability: KVM_CAP_ONE_REG
				2103	Architectures: all
				2104	Type: vcpu ioctl
				2105	Parameters: struct kvm_one_reg (in and out)
				2106	Returns: 0 on success, negative value on failure
				2107
				2108	This ioctl allows to receive the value of a single register implemented
				2109	in a vcpu. The register to read is indicated by the "id" field of the
				2110	kvm_one_reg struct passed in. On success, the register value can be found
				2111	at the memory location pointed to by "addr".
				2112
				2113	The list of registers accessible using this interface is identical to the
				2114	list in 4.68.
				2115
				2116
				2117	4.70 KVM_KVMCLOCK_CTRL
				2118
				2119	Capability: KVM_CAP_KVMCLOCK_CTRL
				2120	Architectures: Any that implement pvclocks (currently x86 only)
				2121	Type: vcpu ioctl
				2122	Parameters: None
				2123	Returns: 0 on success, -1 on error
				2124
				2125	This signals to the host kernel that the specified guest is being paused by
				2126	userspace. The host will set a flag in the pvclock structure that is checked
				2127	from the soft lockup watchdog. The flag is part of the pvclock structure that
				2128	is shared between guest and host, specifically the second bit of the flags
				2129	field of the pvclock_vcpu_time_info structure. It will be set exclusively by
				2130	the host and read/cleared exclusively by the guest. The guest operation of
				2131	checking and clearing the flag must an atomic operation so
				2132	load-link/store-conditional, or equivalent must be used. There are two cases
				2133	where the guest will clear the flag: when the soft lockup watchdog timer resets
				2134	itself or when a soft lockup is detected. This ioctl can be called any time
				2135	after pausing the vcpu, but before it is resumed.
				2136
				2137
				2138	4.71 KVM_SIGNAL_MSI
				2139
				2140	Capability: KVM_CAP_SIGNAL_MSI
				2141	Architectures: x86 arm arm64
				2142	Type: vm ioctl
				2143	Parameters: struct kvm_msi (in)
				2144	Returns: >0 on delivery, 0 if guest blocked the MSI, and -1 on error
				2145
				2146	Directly inject a MSI message. Only valid with in-kernel irqchip that handles
				2147	MSI messages.
				2148
				2149	struct kvm_msi {
				2150	__u32 address_lo;
				2151	__u32 address_hi;
				2152	__u32 data;
				2153	__u32 flags;
				2154	__u32 devid;
				2155	__u8 pad[12];
				2156	};
				2157
				2158	flags: KVM_MSI_VALID_DEVID: devid contains a valid value. The per-VM
				2159	KVM_CAP_MSI_DEVID capability advertises the requirement to provide
				2160	the device ID. If this capability is not available, userspace
				2161	should never set the KVM_MSI_VALID_DEVID flag as the ioctl might fail.
				2162
				2163	If KVM_MSI_VALID_DEVID is set, devid contains a unique device identifier
				2164	for the device that wrote the MSI message. For PCI, this is usually a
				2165	BFD identifier in the lower 16 bits.
				2166
				2167	On x86, address_hi is ignored unless the KVM_X2APIC_API_USE_32BIT_IDS
				2168	feature of KVM_CAP_X2APIC_API capability is enabled. If it is enabled,
				2169	address_hi bits 31-8 provide bits 31-8 of the destination id. Bits 7-0 of
				2170	address_hi must be zero.
				2171
				2172
				2173	4.71 KVM_CREATE_PIT2
				2174
				2175	Capability: KVM_CAP_PIT2
				2176	Architectures: x86
				2177	Type: vm ioctl
				2178	Parameters: struct kvm_pit_config (in)
				2179	Returns: 0 on success, -1 on error
				2180
				2181	Creates an in-kernel device model for the i8254 PIT. This call is only valid
				2182	after enabling in-kernel irqchip support via KVM_CREATE_IRQCHIP. The following
				2183	parameters have to be passed:
				2184
				2185	struct kvm_pit_config {
				2186	__u32 flags;
				2187	__u32 pad[15];
				2188	};
				2189
				2190	Valid flags are:
				2191
				2192	#define KVM_PIT_SPEAKER_DUMMY 1 /* emulate speaker port stub */
				2193
				2194	PIT timer interrupts may use a per-VM kernel thread for injection. If it
				2195	exists, this thread will have a name of the following pattern:
				2196
				2197	kvm-pit/<owner-process-pid>
				2198
				2199	When running a guest with elevated priorities, the scheduling parameters of
				2200	this thread may have to be adjusted accordingly.
				2201
				2202	This IOCTL replaces the obsolete KVM_CREATE_PIT.
				2203
				2204
				2205	4.72 KVM_GET_PIT2
				2206
				2207	Capability: KVM_CAP_PIT_STATE2
				2208	Architectures: x86
				2209	Type: vm ioctl
				2210	Parameters: struct kvm_pit_state2 (out)
				2211	Returns: 0 on success, -1 on error
				2212
				2213	Retrieves the state of the in-kernel PIT model. Only valid after
				2214	KVM_CREATE_PIT2. The state is returned in the following structure:
				2215
				2216	struct kvm_pit_state2 {
				2217	struct kvm_pit_channel_state channels[3];
				2218	__u32 flags;
				2219	__u32 reserved[9];
				2220	};
				2221
				2222	Valid flags are:
				2223
				2224	/* disable PIT in HPET legacy mode */
				2225	#define KVM_PIT_FLAGS_HPET_LEGACY 0x00000001
				2226
				2227	This IOCTL replaces the obsolete KVM_GET_PIT.
				2228
				2229
				2230	4.73 KVM_SET_PIT2
				2231
				2232	Capability: KVM_CAP_PIT_STATE2
				2233	Architectures: x86
				2234	Type: vm ioctl
				2235	Parameters: struct kvm_pit_state2 (in)
				2236	Returns: 0 on success, -1 on error
				2237
				2238	Sets the state of the in-kernel PIT model. Only valid after KVM_CREATE_PIT2.
				2239	See KVM_GET_PIT2 for details on struct kvm_pit_state2.
				2240
				2241	This IOCTL replaces the obsolete KVM_SET_PIT.
				2242
				2243
				2244	4.74 KVM_PPC_GET_SMMU_INFO
				2245
				2246	Capability: KVM_CAP_PPC_GET_SMMU_INFO
				2247	Architectures: powerpc
				2248	Type: vm ioctl
				2249	Parameters: None
				2250	Returns: 0 on success, -1 on error
				2251
				2252	This populates and returns a structure describing the features of
				2253	the "Server" class MMU emulation supported by KVM.
				2254	This can in turn be used by userspace to generate the appropriate
				2255	device-tree properties for the guest operating system.
				2256
				2257	The structure contains some global information, followed by an
				2258	array of supported segment page sizes:
				2259
				2260	struct kvm_ppc_smmu_info {
				2261	__u64 flags;
				2262	__u32 slb_size;
				2263	__u32 pad;
				2264	struct kvm_ppc_one_seg_page_size sps[KVM_PPC_PAGE_SIZES_MAX_SZ];
				2265	};
				2266
				2267	The supported flags are:
				2268
				2269	- KVM_PPC_PAGE_SIZES_REAL:
				2270	When that flag is set, guest page sizes must "fit" the backing
				2271	store page sizes. When not set, any page size in the list can
				2272	be used regardless of how they are backed by userspace.
				2273
				2274	- KVM_PPC_1T_SEGMENTS
				2275	The emulated MMU supports 1T segments in addition to the
				2276	standard 256M ones.
				2277
				2278	The "slb_size" field indicates how many SLB entries are supported
				2279
				2280	The "sps" array contains 8 entries indicating the supported base
				2281	page sizes for a segment in increasing order. Each entry is defined
				2282	as follow:
				2283
				2284	struct kvm_ppc_one_seg_page_size {
				2285	__u32 page_shift; /* Base page shift of segment (or 0) */
				2286	__u32 slb_enc; /* SLB encoding for BookS */
				2287	struct kvm_ppc_one_page_size enc[KVM_PPC_PAGE_SIZES_MAX_SZ];
				2288	};
				2289
				2290	An entry with a "page_shift" of 0 is unused. Because the array is
				2291	organized in increasing order, a lookup can stop when encoutering
				2292	such an entry.
				2293
				2294	The "slb_enc" field provides the encoding to use in the SLB for the
				2295	page size. The bits are in positions such as the value can directly
				2296	be OR'ed into the "vsid" argument of the slbmte instruction.
				2297
				2298	The "enc" array is a list which for each of those segment base page
				2299	size provides the list of supported actual page sizes (which can be
				2300	only larger or equal to the base page size), along with the
				2301	corresponding encoding in the hash PTE. Similarly, the array is
				2302	8 entries sorted by increasing sizes and an entry with a "0" shift
				2303	is an empty entry and a terminator:
				2304
				2305	struct kvm_ppc_one_page_size {
				2306	__u32 page_shift; /* Page shift (or 0) */
				2307	__u32 pte_enc; /* Encoding in the HPTE (>>12) */
				2308	};
				2309
				2310	The "pte_enc" field provides a value that can OR'ed into the hash
				2311	PTE's RPN field (ie, it needs to be shifted left by 12 to OR it
				2312	into the hash PTE second double word).
				2313
				2314	4.75 KVM_IRQFD
				2315
				2316	Capability: KVM_CAP_IRQFD
				2317	Architectures: x86 s390 arm arm64
				2318	Type: vm ioctl
				2319	Parameters: struct kvm_irqfd (in)
				2320	Returns: 0 on success, -1 on error
				2321
				2322	Allows setting an eventfd to directly trigger a guest interrupt.
				2323	kvm_irqfd.fd specifies the file descriptor to use as the eventfd and
				2324	kvm_irqfd.gsi specifies the irqchip pin toggled by this event. When
				2325	an event is triggered on the eventfd, an interrupt is injected into
				2326	the guest using the specified gsi pin. The irqfd is removed using
				2327	the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd
				2328	and kvm_irqfd.gsi.
				2329
				2330	With KVM_CAP_IRQFD_RESAMPLE, KVM_IRQFD supports a de-assert and notify
				2331	mechanism allowing emulation of level-triggered, irqfd-based
				2332	interrupts. When KVM_IRQFD_FLAG_RESAMPLE is set the user must pass an
				2333	additional eventfd in the kvm_irqfd.resamplefd field. When operating
				2334	in resample mode, posting of an interrupt through kvm_irq.fd asserts
				2335	the specified gsi in the irqchip. When the irqchip is resampled, such
				2336	as from an EOI, the gsi is de-asserted and the user is notified via
				2337	kvm_irqfd.resamplefd. It is the user's responsibility to re-queue
				2338	the interrupt if the device making use of it still requires service.
				2339	Note that closing the resamplefd is not sufficient to disable the
				2340	irqfd. The KVM_IRQFD_FLAG_RESAMPLE is only necessary on assignment
				2341	and need not be specified with KVM_IRQFD_FLAG_DEASSIGN.
				2342
				2343	On arm/arm64, gsi routing being supported, the following can happen:
				2344	- in case no routing entry is associated to this gsi, injection fails
				2345	- in case the gsi is associated to an irqchip routing entry,
				2346	irqchip.pin + 32 corresponds to the injected SPI ID.
				2347	- in case the gsi is associated to an MSI routing entry, the MSI
				2348	message and device ID are translated into an LPI (support restricted
				2349	to GICv3 ITS in-kernel emulation).
				2350
				2351	4.76 KVM_PPC_ALLOCATE_HTAB
				2352
				2353	Capability: KVM_CAP_PPC_ALLOC_HTAB
				2354	Architectures: powerpc
				2355	Type: vm ioctl
				2356	Parameters: Pointer to u32 containing hash table order (in/out)
				2357	Returns: 0 on success, -1 on error
				2358
				2359	This requests the host kernel to allocate an MMU hash table for a
				2360	guest using the PAPR paravirtualization interface. This only does
				2361	anything if the kernel is configured to use the Book 3S HV style of
				2362	virtualization. Otherwise the capability doesn't exist and the ioctl
				2363	returns an ENOTTY error. The rest of this description assumes Book 3S
				2364	HV.
				2365
				2366	There must be no vcpus running when this ioctl is called; if there
				2367	are, it will do nothing and return an EBUSY error.
				2368
				2369	The parameter is a pointer to a 32-bit unsigned integer variable
				2370	containing the order (log base 2) of the desired size of the hash
				2371	table, which must be between 18 and 46. On successful return from the
				2372	ioctl, the value will not be changed by the kernel.
				2373
				2374	If no hash table has been allocated when any vcpu is asked to run
				2375	(with the KVM_RUN ioctl), the host kernel will allocate a
				2376	default-sized hash table (16 MB).
				2377
				2378	If this ioctl is called when a hash table has already been allocated,
				2379	with a different order from the existing hash table, the existing hash
				2380	table will be freed and a new one allocated. If this is ioctl is
				2381	called when a hash table has already been allocated of the same order
				2382	as specified, the kernel will clear out the existing hash table (zero
				2383	all HPTEs). In either case, if the guest is using the virtualized
				2384	real-mode area (VRMA) facility, the kernel will re-create the VMRA
				2385	HPTEs on the next KVM_RUN of any vcpu.
				2386
				2387	4.77 KVM_S390_INTERRUPT
				2388
				2389	Capability: basic
				2390	Architectures: s390
				2391	Type: vm ioctl, vcpu ioctl
				2392	Parameters: struct kvm_s390_interrupt (in)
				2393	Returns: 0 on success, -1 on error
				2394
				2395	Allows to inject an interrupt to the guest. Interrupts can be floating
				2396	(vm ioctl) or per cpu (vcpu ioctl), depending on the interrupt type.
				2397
				2398	Interrupt parameters are passed via kvm_s390_interrupt:
				2399
				2400	struct kvm_s390_interrupt {
				2401	__u32 type;
				2402	__u32 parm;
				2403	__u64 parm64;
				2404	};
				2405
				2406	type can be one of the following:
				2407
				2408	KVM_S390_SIGP_STOP (vcpu) - sigp stop; optional flags in parm
				2409	KVM_S390_PROGRAM_INT (vcpu) - program check; code in parm
				2410	KVM_S390_SIGP_SET_PREFIX (vcpu) - sigp set prefix; prefix address in parm
				2411	KVM_S390_RESTART (vcpu) - restart
				2412	KVM_S390_INT_CLOCK_COMP (vcpu) - clock comparator interrupt
				2413	KVM_S390_INT_CPU_TIMER (vcpu) - CPU timer interrupt
				2414	KVM_S390_INT_VIRTIO (vm) - virtio external interrupt; external interrupt
				2415	parameters in parm and parm64
				2416	KVM_S390_INT_SERVICE (vm) - sclp external interrupt; sclp parameter in parm
				2417	KVM_S390_INT_EMERGENCY (vcpu) - sigp emergency; source cpu in parm
				2418	KVM_S390_INT_EXTERNAL_CALL (vcpu) - sigp external call; source cpu in parm
				2419	KVM_S390_INT_IO(ai,cssid,ssid,schid) (vm) - compound value to indicate an
				2420	I/O interrupt (ai - adapter interrupt; cssid,ssid,schid - subchannel);
				2421	I/O interruption parameters in parm (subchannel) and parm64 (intparm,
				2422	interruption subclass)
				2423	KVM_S390_MCHK (vm, vcpu) - machine check interrupt; cr 14 bits in parm,
				2424	machine check interrupt code in parm64 (note that
				2425	machine checks needing further payload are not
				2426	supported by this ioctl)
				2427
				2428	Note that the vcpu ioctl is asynchronous to vcpu execution.
				2429
				2430	4.78 KVM_PPC_GET_HTAB_FD
				2431
				2432	Capability: KVM_CAP_PPC_HTAB_FD
				2433	Architectures: powerpc
				2434	Type: vm ioctl
				2435	Parameters: Pointer to struct kvm_get_htab_fd (in)
				2436	Returns: file descriptor number (>= 0) on success, -1 on error
				2437
				2438	This returns a file descriptor that can be used either to read out the
				2439	entries in the guest's hashed page table (HPT), or to write entries to
				2440	initialize the HPT. The returned fd can only be written to if the
				2441	KVM_GET_HTAB_WRITE bit is set in the flags field of the argument, and
				2442	can only be read if that bit is clear. The argument struct looks like
				2443	this:
				2444
				2445	/* For KVM_PPC_GET_HTAB_FD */
				2446	struct kvm_get_htab_fd {
				2447	__u64 flags;
				2448	__u64 start_index;
				2449	__u64 reserved[2];
				2450	};
				2451
				2452	/* Values for kvm_get_htab_fd.flags */
				2453	#define KVM_GET_HTAB_BOLTED_ONLY ((__u64)0x1)
				2454	#define KVM_GET_HTAB_WRITE ((__u64)0x2)
				2455
				2456	The `start_index' field gives the index in the HPT of the entry at
				2457	which to start reading. It is ignored when writing.
				2458
				2459	Reads on the fd will initially supply information about all
				2460	"interesting" HPT entries. Interesting entries are those with the
				2461	bolted bit set, if the KVM_GET_HTAB_BOLTED_ONLY bit is set, otherwise
				2462	all entries. When the end of the HPT is reached, the read() will
				2463	return. If read() is called again on the fd, it will start again from
				2464	the beginning of the HPT, but will only return HPT entries that have
				2465	changed since they were last read.
				2466
				2467	Data read or written is structured as a header (8 bytes) followed by a
				2468	series of valid HPT entries (16 bytes) each. The header indicates how
				2469	many valid HPT entries there are and how many invalid entries follow
				2470	the valid entries. The invalid entries are not represented explicitly
				2471	in the stream. The header format is:
				2472
				2473	struct kvm_get_htab_header {
				2474	__u32 index;
				2475	__u16 n_valid;
				2476	__u16 n_invalid;
				2477	};
				2478
				2479	Writes to the fd create HPT entries starting at the index given in the
				2480	header; first `n_valid' valid entries with contents from the data
				2481	written, then `n_invalid' invalid entries, invalidating any previously
				2482	valid entries found.
				2483
				2484	4.79 KVM_CREATE_DEVICE
				2485
				2486	Capability: KVM_CAP_DEVICE_CTRL
				2487	Type: vm ioctl
				2488	Parameters: struct kvm_create_device (in/out)
				2489	Returns: 0 on success, -1 on error
				2490	Errors:
				2491	ENODEV: The device type is unknown or unsupported
				2492	EEXIST: Device already created, and this type of device may not
				2493	be instantiated multiple times
				2494
				2495	Other error conditions may be defined by individual device types or
				2496	have their standard meanings.
				2497
				2498	Creates an emulated device in the kernel. The file descriptor returned
				2499	in fd can be used with KVM_SET/GET/HAS_DEVICE_ATTR.
				2500
				2501	If the KVM_CREATE_DEVICE_TEST flag is set, only test whether the
				2502	device type is supported (not necessarily whether it can be created
				2503	in the current vm).
				2504
				2505	Individual devices should not define flags. Attributes should be used
				2506	for specifying any behavior that is not implied by the device type
				2507	number.
				2508
				2509	struct kvm_create_device {
				2510	__u32 type; /* in: KVM_DEV_TYPE_xxx */
				2511	__u32 fd; /* out: device handle */
				2512	__u32 flags; /* in: KVM_CREATE_DEVICE_xxx */
				2513	};
				2514
				2515	4.80 KVM_SET_DEVICE_ATTR/KVM_GET_DEVICE_ATTR
				2516
				2517	Capability: KVM_CAP_DEVICE_CTRL, KVM_CAP_VM_ATTRIBUTES for vm device,
				2518	KVM_CAP_VCPU_ATTRIBUTES for vcpu device
				2519	Type: device ioctl, vm ioctl, vcpu ioctl
				2520	Parameters: struct kvm_device_attr
				2521	Returns: 0 on success, -1 on error
				2522	Errors:
				2523	ENXIO: The group or attribute is unknown/unsupported for this device
				2524	or hardware support is missing.
				2525	EPERM: The attribute cannot (currently) be accessed this way
				2526	(e.g. read-only attribute, or attribute that only makes
				2527	sense when the device is in a different state)
				2528
				2529	Other error conditions may be defined by individual device types.
				2530
				2531	Gets/sets a specified piece of device configuration and/or state. The
				2532	semantics are device-specific. See individual device documentation in
				2533	the "devices" directory. As with ONE_REG, the size of the data
				2534	transferred is defined by the particular attribute.
				2535
				2536	struct kvm_device_attr {
				2537	__u32 flags; /* no flags currently defined */
				2538	__u32 group; /* device-defined */
				2539	__u64 attr; /* group-defined */
				2540	__u64 addr; /* userspace address of attr data */
				2541	};
				2542
				2543	4.81 KVM_HAS_DEVICE_ATTR
				2544
				2545	Capability: KVM_CAP_DEVICE_CTRL, KVM_CAP_VM_ATTRIBUTES for vm device,
				2546	KVM_CAP_VCPU_ATTRIBUTES for vcpu device
				2547	Type: device ioctl, vm ioctl, vcpu ioctl
				2548	Parameters: struct kvm_device_attr
				2549	Returns: 0 on success, -1 on error
				2550	Errors:
				2551	ENXIO: The group or attribute is unknown/unsupported for this device
				2552	or hardware support is missing.
				2553
				2554	Tests whether a device supports a particular attribute. A successful
				2555	return indicates the attribute is implemented. It does not necessarily
				2556	indicate that the attribute can be read or written in the device's
				2557	current state. "addr" is ignored.
				2558
				2559	4.82 KVM_ARM_VCPU_INIT
				2560
				2561	Capability: basic
				2562	Architectures: arm, arm64
				2563	Type: vcpu ioctl
				2564	Parameters: struct kvm_vcpu_init (in)
				2565	Returns: 0 on success; -1 on error
				2566	Errors:
				2567	EINVAL: the target is unknown, or the combination of features is invalid.
				2568	ENOENT: a features bit specified is unknown.
				2569
				2570	This tells KVM what type of CPU to present to the guest, and what
				2571	optional features it should have. This will cause a reset of the cpu
				2572	registers to their initial values. If this is not called, KVM_RUN will
				2573	return ENOEXEC for that vcpu.
				2574
				2575	Note that because some registers reflect machine topology, all vcpus
				2576	should be created before this ioctl is invoked.
				2577
				2578	Userspace can call this function multiple times for a given vcpu, including
				2579	after the vcpu has been run. This will reset the vcpu to its initial
				2580	state. All calls to this function after the initial call must use the same
				2581	target and same set of feature flags, otherwise EINVAL will be returned.
				2582
				2583	Possible features:
				2584	- KVM_ARM_VCPU_POWER_OFF: Starts the CPU in a power-off state.
				2585	Depends on KVM_CAP_ARM_PSCI. If not set, the CPU will be powered on
				2586	and execute guest code when KVM_RUN is called.
				2587	- KVM_ARM_VCPU_EL1_32BIT: Starts the CPU in a 32bit mode.
				2588	Depends on KVM_CAP_ARM_EL1_32BIT (arm64 only).
				2589	- KVM_ARM_VCPU_PSCI_0_2: Emulate PSCI v0.2 (or a future revision
				2590	backward compatible with v0.2) for the CPU.
				2591	Depends on KVM_CAP_ARM_PSCI_0_2.
				2592	- KVM_ARM_VCPU_PMU_V3: Emulate PMUv3 for the CPU.
				2593	Depends on KVM_CAP_ARM_PMU_V3.
				2594
				2595
				2596	4.83 KVM_ARM_PREFERRED_TARGET
				2597
				2598	Capability: basic
				2599	Architectures: arm, arm64
				2600	Type: vm ioctl
				2601	Parameters: struct struct kvm_vcpu_init (out)
				2602	Returns: 0 on success; -1 on error
				2603	Errors:
				2604	ENODEV: no preferred target available for the host
				2605
				2606	This queries KVM for preferred CPU target type which can be emulated
				2607	by KVM on underlying host.
				2608
				2609	The ioctl returns struct kvm_vcpu_init instance containing information
				2610	about preferred CPU target type and recommended features for it. The
				2611	kvm_vcpu_init->features bitmap returned will have feature bits set if
				2612	the preferred target recommends setting these features, but this is
				2613	not mandatory.
				2614
				2615	The information returned by this ioctl can be used to prepare an instance
				2616	of struct kvm_vcpu_init for KVM_ARM_VCPU_INIT ioctl which will result in
				2617	in VCPU matching underlying host.
				2618
				2619
				2620	4.84 KVM_GET_REG_LIST
				2621
				2622	Capability: basic
				2623	Architectures: arm, arm64, mips
				2624	Type: vcpu ioctl
				2625	Parameters: struct kvm_reg_list (in/out)
				2626	Returns: 0 on success; -1 on error
				2627	Errors:
				2628	E2BIG: the reg index list is too big to fit in the array specified by
				2629	the user (the number required will be written into n).
				2630
				2631	struct kvm_reg_list {
				2632	__u64 n; /* number of registers in reg[] */
				2633	__u64 reg[0];
				2634	};
				2635
				2636	This ioctl returns the guest registers that are supported for the
				2637	KVM_GET_ONE_REG/KVM_SET_ONE_REG calls.
				2638
				2639
				2640	4.85 KVM_ARM_SET_DEVICE_ADDR (deprecated)
				2641
				2642	Capability: KVM_CAP_ARM_SET_DEVICE_ADDR
				2643	Architectures: arm, arm64
				2644	Type: vm ioctl
				2645	Parameters: struct kvm_arm_device_address (in)
				2646	Returns: 0 on success, -1 on error
				2647	Errors:
				2648	ENODEV: The device id is unknown
				2649	ENXIO: Device not supported on current system
				2650	EEXIST: Address already set
				2651	E2BIG: Address outside guest physical address space
				2652	EBUSY: Address overlaps with other device range
				2653
				2654	struct kvm_arm_device_addr {
				2655	__u64 id;
				2656	__u64 addr;
				2657	};
				2658
				2659	Specify a device address in the guest's physical address space where guests
				2660	can access emulated or directly exposed devices, which the host kernel needs
				2661	to know about. The id field is an architecture specific identifier for a
				2662	specific device.
				2663
				2664	ARM/arm64 divides the id field into two parts, a device id and an
				2665	address type id specific to the individual device.
				2666
				2667	bits: \| 63 ... 32 \| 31 ... 16 \| 15 ... 0 \|
				2668	field: \| 0x00000000 \| device id \| addr type id \|
				2669
				2670	ARM/arm64 currently only require this when using the in-kernel GIC
				2671	support for the hardware VGIC features, using KVM_ARM_DEVICE_VGIC_V2
				2672	as the device id. When setting the base address for the guest's
				2673	mapping of the VGIC virtual CPU and distributor interface, the ioctl
				2674	must be called after calling KVM_CREATE_IRQCHIP, but before calling
				2675	KVM_RUN on any of the VCPUs. Calling this ioctl twice for any of the
				2676	base addresses will return -EEXIST.
				2677
				2678	Note, this IOCTL is deprecated and the more flexible SET/GET_DEVICE_ATTR API
				2679	should be used instead.
				2680
				2681
				2682	4.86 KVM_PPC_RTAS_DEFINE_TOKEN
				2683
				2684	Capability: KVM_CAP_PPC_RTAS
				2685	Architectures: ppc
				2686	Type: vm ioctl
				2687	Parameters: struct kvm_rtas_token_args
				2688	Returns: 0 on success, -1 on error
				2689
				2690	Defines a token value for a RTAS (Run Time Abstraction Services)
				2691	service in order to allow it to be handled in the kernel. The
				2692	argument struct gives the name of the service, which must be the name
				2693	of a service that has a kernel-side implementation. If the token
				2694	value is non-zero, it will be associated with that service, and
				2695	subsequent RTAS calls by the guest specifying that token will be
				2696	handled by the kernel. If the token value is 0, then any token
				2697	associated with the service will be forgotten, and subsequent RTAS
				2698	calls by the guest for that service will be passed to userspace to be
				2699	handled.
				2700
				2701	4.87 KVM_SET_GUEST_DEBUG
				2702
				2703	Capability: KVM_CAP_SET_GUEST_DEBUG
				2704	Architectures: x86, s390, ppc, arm64
				2705	Type: vcpu ioctl
				2706	Parameters: struct kvm_guest_debug (in)
				2707	Returns: 0 on success; -1 on error
				2708
				2709	struct kvm_guest_debug {
				2710	__u32 control;
				2711	__u32 pad;
				2712	struct kvm_guest_debug_arch arch;
				2713	};
				2714
				2715	Set up the processor specific debug registers and configure vcpu for
				2716	handling guest debug events. There are two parts to the structure, the
				2717	first a control bitfield indicates the type of debug events to handle
				2718	when running. Common control bits are:
				2719
				2720	- KVM_GUESTDBG_ENABLE: guest debugging is enabled
				2721	- KVM_GUESTDBG_SINGLESTEP: the next run should single-step
				2722
				2723	The top 16 bits of the control field are architecture specific control
				2724	flags which can include the following:
				2725
				2726	- KVM_GUESTDBG_USE_SW_BP: using software breakpoints [x86, arm64]
				2727	- KVM_GUESTDBG_USE_HW_BP: using hardware breakpoints [x86, s390, arm64]
				2728	- KVM_GUESTDBG_INJECT_DB: inject DB type exception [x86]
				2729	- KVM_GUESTDBG_INJECT_BP: inject BP type exception [x86]
				2730	- KVM_GUESTDBG_EXIT_PENDING: trigger an immediate guest exit [s390]
				2731
				2732	For example KVM_GUESTDBG_USE_SW_BP indicates that software breakpoints
				2733	are enabled in memory so we need to ensure breakpoint exceptions are
				2734	correctly trapped and the KVM run loop exits at the breakpoint and not
				2735	running off into the normal guest vector. For KVM_GUESTDBG_USE_HW_BP
				2736	we need to ensure the guest vCPUs architecture specific registers are
				2737	updated to the correct (supplied) values.
				2738
				2739	The second part of the structure is architecture specific and
				2740	typically contains a set of debug registers.
				2741
				2742	For arm64 the number of debug registers is implementation defined and
				2743	can be determined by querying the KVM_CAP_GUEST_DEBUG_HW_BPS and
				2744	KVM_CAP_GUEST_DEBUG_HW_WPS capabilities which return a positive number
				2745	indicating the number of supported registers.
				2746
				2747	When debug events exit the main run loop with the reason
				2748	KVM_EXIT_DEBUG with the kvm_debug_exit_arch part of the kvm_run
				2749	structure containing architecture specific debug information.
				2750
				2751	4.88 KVM_GET_EMULATED_CPUID
				2752
				2753	Capability: KVM_CAP_EXT_EMUL_CPUID
				2754	Architectures: x86
				2755	Type: system ioctl
				2756	Parameters: struct kvm_cpuid2 (in/out)
				2757	Returns: 0 on success, -1 on error
				2758
				2759	struct kvm_cpuid2 {
				2760	__u32 nent;
				2761	__u32 flags;
				2762	struct kvm_cpuid_entry2 entries[0];
				2763	};
				2764
				2765	The member 'flags' is used for passing flags from userspace.
				2766
				2767	#define KVM_CPUID_FLAG_SIGNIFCANT_INDEX BIT(0)
				2768	#define KVM_CPUID_FLAG_STATEFUL_FUNC BIT(1)
				2769	#define KVM_CPUID_FLAG_STATE_READ_NEXT BIT(2)
				2770
				2771	struct kvm_cpuid_entry2 {
				2772	__u32 function;
				2773	__u32 index;
				2774	__u32 flags;
				2775	__u32 eax;
				2776	__u32 ebx;
				2777	__u32 ecx;
				2778	__u32 edx;
				2779	__u32 padding[3];
				2780	};
				2781
				2782	This ioctl returns x86 cpuid features which are emulated by
				2783	kvm.Userspace can use the information returned by this ioctl to query
				2784	which features are emulated by kvm instead of being present natively.
				2785
				2786	Userspace invokes KVM_GET_EMULATED_CPUID by passing a kvm_cpuid2
				2787	structure with the 'nent' field indicating the number of entries in
				2788	the variable-size array 'entries'. If the number of entries is too low
				2789	to describe the cpu capabilities, an error (E2BIG) is returned. If the
				2790	number is too high, the 'nent' field is adjusted and an error (ENOMEM)
				2791	is returned. If the number is just right, the 'nent' field is adjusted
				2792	to the number of valid entries in the 'entries' array, which is then
				2793	filled.
				2794
				2795	The entries returned are the set CPUID bits of the respective features
				2796	which kvm emulates, as returned by the CPUID instruction, with unknown
				2797	or unsupported feature bits cleared.
				2798
				2799	Features like x2apic, for example, may not be present in the host cpu
				2800	but are exposed by kvm in KVM_GET_SUPPORTED_CPUID because they can be
				2801	emulated efficiently and thus not included here.
				2802
				2803	The fields in each entry are defined as follows:
				2804
				2805	function: the eax value used to obtain the entry
				2806	index: the ecx value used to obtain the entry (for entries that are
				2807	affected by ecx)
				2808	flags: an OR of zero or more of the following:
				2809	KVM_CPUID_FLAG_SIGNIFCANT_INDEX:
				2810	if the index field is valid
				2811	KVM_CPUID_FLAG_STATEFUL_FUNC:
				2812	if cpuid for this function returns different values for successive
				2813	invocations; there will be several entries with the same function,
				2814	all with this flag set
				2815	KVM_CPUID_FLAG_STATE_READ_NEXT:
				2816	for KVM_CPUID_FLAG_STATEFUL_FUNC entries, set if this entry is
				2817	the first entry to be read by a cpu
				2818	eax, ebx, ecx, edx: the values returned by the cpuid instruction for
				2819	this function/index combination
				2820
				2821	4.89 KVM_S390_MEM_OP
				2822
				2823	Capability: KVM_CAP_S390_MEM_OP
				2824	Architectures: s390
				2825	Type: vcpu ioctl
				2826	Parameters: struct kvm_s390_mem_op (in)
				2827	Returns: = 0 on success,
				2828	< 0 on generic error (e.g. -EFAULT or -ENOMEM),
				2829	> 0 if an exception occurred while walking the page tables
				2830
				2831	Read or write data from/to the logical (virtual) memory of a VCPU.
				2832
				2833	Parameters are specified via the following structure:
				2834
				2835	struct kvm_s390_mem_op {
				2836	__u64 gaddr; /* the guest address */
				2837	__u64 flags; /* flags */
				2838	__u32 size; /* amount of bytes */
				2839	__u32 op; /* type of operation */
				2840	__u64 buf; /* buffer in userspace */
				2841	__u8 ar; /* the access register number */
				2842	__u8 reserved[31]; /* should be set to 0 */
				2843	};
				2844
				2845	The type of operation is specified in the "op" field. It is either
				2846	KVM_S390_MEMOP_LOGICAL_READ for reading from logical memory space or
				2847	KVM_S390_MEMOP_LOGICAL_WRITE for writing to logical memory space. The
				2848	KVM_S390_MEMOP_F_CHECK_ONLY flag can be set in the "flags" field to check
				2849	whether the corresponding memory access would create an access exception
				2850	(without touching the data in the memory at the destination). In case an
				2851	access exception occurred while walking the MMU tables of the guest, the
				2852	ioctl returns a positive error number to indicate the type of exception.
				2853	This exception is also raised directly at the corresponding VCPU if the
				2854	flag KVM_S390_MEMOP_F_INJECT_EXCEPTION is set in the "flags" field.
				2855
				2856	The start address of the memory region has to be specified in the "gaddr"
				2857	field, and the length of the region in the "size" field. "buf" is the buffer
				2858	supplied by the userspace application where the read data should be written
				2859	to for KVM_S390_MEMOP_LOGICAL_READ, or where the data that should be written
				2860	is stored for a KVM_S390_MEMOP_LOGICAL_WRITE. "buf" is unused and can be NULL
				2861	when KVM_S390_MEMOP_F_CHECK_ONLY is specified. "ar" designates the access
				2862	register number to be used.
				2863
				2864	The "reserved" field is meant for future extensions. It is not used by
				2865	KVM with the currently defined set of flags.
				2866
				2867	4.90 KVM_S390_GET_SKEYS
				2868
				2869	Capability: KVM_CAP_S390_SKEYS
				2870	Architectures: s390
				2871	Type: vm ioctl
				2872	Parameters: struct kvm_s390_skeys
				2873	Returns: 0 on success, KVM_S390_GET_KEYS_NONE if guest is not using storage
				2874	keys, negative value on error
				2875
				2876	This ioctl is used to get guest storage key values on the s390
				2877	architecture. The ioctl takes parameters via the kvm_s390_skeys struct.
				2878
				2879	struct kvm_s390_skeys {
				2880	__u64 start_gfn;
				2881	__u64 count;
				2882	__u64 skeydata_addr;
				2883	__u32 flags;
				2884	__u32 reserved[9];
				2885	};
				2886
				2887	The start_gfn field is the number of the first guest frame whose storage keys
				2888	you want to get.
				2889
				2890	The count field is the number of consecutive frames (starting from start_gfn)
				2891	whose storage keys to get. The count field must be at least 1 and the maximum
				2892	allowed value is defined as KVM_S390_SKEYS_ALLOC_MAX. Values outside this range
				2893	will cause the ioctl to return -EINVAL.
				2894
				2895	The skeydata_addr field is the address to a buffer large enough to hold count
				2896	bytes. This buffer will be filled with storage key data by the ioctl.
				2897
				2898	4.91 KVM_S390_SET_SKEYS
				2899
				2900	Capability: KVM_CAP_S390_SKEYS
				2901	Architectures: s390
				2902	Type: vm ioctl
				2903	Parameters: struct kvm_s390_skeys
				2904	Returns: 0 on success, negative value on error
				2905
				2906	This ioctl is used to set guest storage key values on the s390
				2907	architecture. The ioctl takes parameters via the kvm_s390_skeys struct.
				2908	See section on KVM_S390_GET_SKEYS for struct definition.
				2909
				2910	The start_gfn field is the number of the first guest frame whose storage keys
				2911	you want to set.
				2912
				2913	The count field is the number of consecutive frames (starting from start_gfn)
				2914	whose storage keys to get. The count field must be at least 1 and the maximum
				2915	allowed value is defined as KVM_S390_SKEYS_ALLOC_MAX. Values outside this range
				2916	will cause the ioctl to return -EINVAL.
				2917
				2918	The skeydata_addr field is the address to a buffer containing count bytes of
				2919	storage keys. Each byte in the buffer will be set as the storage key for a
				2920	single frame starting at start_gfn for count frames.
				2921
				2922	Note: If any architecturally invalid key value is found in the given data then
				2923	the ioctl will return -EINVAL.
				2924
				2925	4.92 KVM_S390_IRQ
				2926
				2927	Capability: KVM_CAP_S390_INJECT_IRQ
				2928	Architectures: s390
				2929	Type: vcpu ioctl
				2930	Parameters: struct kvm_s390_irq (in)
				2931	Returns: 0 on success, -1 on error
				2932	Errors:
				2933	EINVAL: interrupt type is invalid
				2934	type is KVM_S390_SIGP_STOP and flag parameter is invalid value
				2935	type is KVM_S390_INT_EXTERNAL_CALL and code is bigger
				2936	than the maximum of VCPUs
				2937	EBUSY: type is KVM_S390_SIGP_SET_PREFIX and vcpu is not stopped
				2938	type is KVM_S390_SIGP_STOP and a stop irq is already pending
				2939	type is KVM_S390_INT_EXTERNAL_CALL and an external call interrupt
				2940	is already pending
				2941
				2942	Allows to inject an interrupt to the guest.
				2943
				2944	Using struct kvm_s390_irq as a parameter allows
				2945	to inject additional payload which is not
				2946	possible via KVM_S390_INTERRUPT.
				2947
				2948	Interrupt parameters are passed via kvm_s390_irq:
				2949
				2950	struct kvm_s390_irq {
				2951	__u64 type;
				2952	union {
				2953	struct kvm_s390_io_info io;
				2954	struct kvm_s390_ext_info ext;
				2955	struct kvm_s390_pgm_info pgm;
				2956	struct kvm_s390_emerg_info emerg;
				2957	struct kvm_s390_extcall_info extcall;
				2958	struct kvm_s390_prefix_info prefix;
				2959	struct kvm_s390_stop_info stop;
				2960	struct kvm_s390_mchk_info mchk;
				2961	char reserved[64];
				2962	} u;
				2963	};
				2964
				2965	type can be one of the following:
				2966
				2967	KVM_S390_SIGP_STOP - sigp stop; parameter in .stop
				2968	KVM_S390_PROGRAM_INT - program check; parameters in .pgm
				2969	KVM_S390_SIGP_SET_PREFIX - sigp set prefix; parameters in .prefix
				2970	KVM_S390_RESTART - restart; no parameters
				2971	KVM_S390_INT_CLOCK_COMP - clock comparator interrupt; no parameters
				2972	KVM_S390_INT_CPU_TIMER - CPU timer interrupt; no parameters
				2973	KVM_S390_INT_EMERGENCY - sigp emergency; parameters in .emerg
				2974	KVM_S390_INT_EXTERNAL_CALL - sigp external call; parameters in .extcall
				2975	KVM_S390_MCHK - machine check interrupt; parameters in .mchk
				2976
				2977
				2978	Note that the vcpu ioctl is asynchronous to vcpu execution.
				2979
				2980	4.94 KVM_S390_GET_IRQ_STATE
				2981
				2982	Capability: KVM_CAP_S390_IRQ_STATE
				2983	Architectures: s390
				2984	Type: vcpu ioctl
				2985	Parameters: struct kvm_s390_irq_state (out)
				2986	Returns: >= number of bytes copied into buffer,
				2987	-EINVAL if buffer size is 0,
				2988	-ENOBUFS if buffer size is too small to fit all pending interrupts,
				2989	-EFAULT if the buffer address was invalid
				2990
				2991	This ioctl allows userspace to retrieve the complete state of all currently
				2992	pending interrupts in a single buffer. Use cases include migration
				2993	and introspection. The parameter structure contains the address of a
				2994	userspace buffer and its length:
				2995
				2996	struct kvm_s390_irq_state {
				2997	__u64 buf;
				2998	__u32 flags; /* will stay unused for compatibility reasons */
				2999	__u32 len;
				3000	__u32 reserved[4]; /* will stay unused for compatibility reasons */
				3001	};
				3002
				3003	Userspace passes in the above struct and for each pending interrupt a
				3004	struct kvm_s390_irq is copied to the provided buffer.
				3005
				3006	The structure contains a flags and a reserved field for future extensions. As
				3007	the kernel never checked for flags == 0 and QEMU never pre-zeroed flags and
				3008	reserved, these fields can not be used in the future without breaking
				3009	compatibility.
				3010
				3011	If -ENOBUFS is returned the buffer provided was too small and userspace
				3012	may retry with a bigger buffer.
				3013
				3014	4.95 KVM_S390_SET_IRQ_STATE
				3015
				3016	Capability: KVM_CAP_S390_IRQ_STATE
				3017	Architectures: s390
				3018	Type: vcpu ioctl
				3019	Parameters: struct kvm_s390_irq_state (in)
				3020	Returns: 0 on success,
				3021	-EFAULT if the buffer address was invalid,
				3022	-EINVAL for an invalid buffer length (see below),
				3023	-EBUSY if there were already interrupts pending,
				3024	errors occurring when actually injecting the
				3025	interrupt. See KVM_S390_IRQ.
				3026
				3027	This ioctl allows userspace to set the complete state of all cpu-local
				3028	interrupts currently pending for the vcpu. It is intended for restoring
				3029	interrupt state after a migration. The input parameter is a userspace buffer
				3030	containing a struct kvm_s390_irq_state:
				3031
				3032	struct kvm_s390_irq_state {
				3033	__u64 buf;
				3034	__u32 flags; /* will stay unused for compatibility reasons */
				3035	__u32 len;
				3036	__u32 reserved[4]; /* will stay unused for compatibility reasons */
				3037	};
				3038
				3039	The restrictions for flags and reserved apply as well.
				3040	(see KVM_S390_GET_IRQ_STATE)
				3041
				3042	The userspace memory referenced by buf contains a struct kvm_s390_irq
				3043	for each interrupt to be injected into the guest.
				3044	If one of the interrupts could not be injected for some reason the
				3045	ioctl aborts.
				3046
				3047	len must be a multiple of sizeof(struct kvm_s390_irq). It must be > 0
				3048	and it must not exceed (max_vcpus + 32) * sizeof(struct kvm_s390_irq),
				3049	which is the maximum number of possibly pending cpu-local interrupts.
				3050
				3051	4.96 KVM_SMI
				3052
				3053	Capability: KVM_CAP_X86_SMM
				3054	Architectures: x86
				3055	Type: vcpu ioctl
				3056	Parameters: none
				3057	Returns: 0 on success, -1 on error
				3058
				3059	Queues an SMI on the thread's vcpu.
				3060
				3061	4.97 KVM_CAP_PPC_MULTITCE
				3062
				3063	Capability: KVM_CAP_PPC_MULTITCE
				3064	Architectures: ppc
				3065	Type: vm
				3066
				3067	This capability means the kernel is capable of handling hypercalls
				3068	H_PUT_TCE_INDIRECT and H_STUFF_TCE without passing those into the user
				3069	space. This significantly accelerates DMA operations for PPC KVM guests.
				3070	User space should expect that its handlers for these hypercalls
				3071	are not going to be called if user space previously registered LIOBN
				3072	in KVM (via KVM_CREATE_SPAPR_TCE or similar calls).
				3073
				3074	In order to enable H_PUT_TCE_INDIRECT and H_STUFF_TCE use in the guest,
				3075	user space might have to advertise it for the guest. For example,
				3076	IBM pSeries (sPAPR) guest starts using them if "hcall-multi-tce" is
				3077	present in the "ibm,hypertas-functions" device-tree property.
				3078
				3079	The hypercalls mentioned above may or may not be processed successfully
				3080	in the kernel based fast path. If they can not be handled by the kernel,
				3081	they will get passed on to user space. So user space still has to have
				3082	an implementation for these despite the in kernel acceleration.
				3083
				3084	This capability is always enabled.
				3085
				3086	4.98 KVM_CREATE_SPAPR_TCE_64
				3087
				3088	Capability: KVM_CAP_SPAPR_TCE_64
				3089	Architectures: powerpc
				3090	Type: vm ioctl
				3091	Parameters: struct kvm_create_spapr_tce_64 (in)
				3092	Returns: file descriptor for manipulating the created TCE table
				3093
				3094	This is an extension for KVM_CAP_SPAPR_TCE which only supports 32bit
				3095	windows, described in 4.62 KVM_CREATE_SPAPR_TCE
				3096
				3097	This capability uses extended struct in ioctl interface:
				3098
				3099	/* for KVM_CAP_SPAPR_TCE_64 */
				3100	struct kvm_create_spapr_tce_64 {
				3101	__u64 liobn;
				3102	__u32 page_shift;
				3103	__u32 flags;
				3104	__u64 offset; /* in pages */
				3105	__u64 size; /* in pages */
				3106	};
				3107
				3108	The aim of extension is to support an additional bigger DMA window with
				3109	a variable page size.
				3110	KVM_CREATE_SPAPR_TCE_64 receives a 64bit window size, an IOMMU page shift and
				3111	a bus offset of the corresponding DMA window, @size and @offset are numbers
				3112	of IOMMU pages.
				3113
				3114	@flags are not used at the moment.
				3115
				3116	The rest of functionality is identical to KVM_CREATE_SPAPR_TCE.
				3117
				3118	4.99 KVM_REINJECT_CONTROL
				3119
				3120	Capability: KVM_CAP_REINJECT_CONTROL
				3121	Architectures: x86
				3122	Type: vm ioctl
				3123	Parameters: struct kvm_reinject_control (in)
				3124	Returns: 0 on success,
				3125	-EFAULT if struct kvm_reinject_control cannot be read,
				3126	-ENXIO if KVM_CREATE_PIT or KVM_CREATE_PIT2 didn't succeed earlier.
				3127
				3128	i8254 (PIT) has two modes, reinject and !reinject. The default is reinject,
				3129	where KVM queues elapsed i8254 ticks and monitors completion of interrupt from
				3130	vector(s) that i8254 injects. Reinject mode dequeues a tick and injects its
				3131	interrupt whenever there isn't a pending interrupt from i8254.
				3132	!reinject mode injects an interrupt as soon as a tick arrives.
				3133
				3134	struct kvm_reinject_control {
				3135	__u8 pit_reinject;
				3136	__u8 reserved[31];
				3137	};
				3138
				3139	pit_reinject = 0 (!reinject mode) is recommended, unless running an old
				3140	operating system that uses the PIT for timing (e.g. Linux 2.4.x).
				3141
				3142	4.100 KVM_PPC_CONFIGURE_V3_MMU
				3143
				3144	Capability: KVM_CAP_PPC_RADIX_MMU or KVM_CAP_PPC_HASH_MMU_V3
				3145	Architectures: ppc
				3146	Type: vm ioctl
				3147	Parameters: struct kvm_ppc_mmuv3_cfg (in)
				3148	Returns: 0 on success,
				3149	-EFAULT if struct kvm_ppc_mmuv3_cfg cannot be read,
				3150	-EINVAL if the configuration is invalid
				3151
				3152	This ioctl controls whether the guest will use radix or HPT (hashed
				3153	page table) translation, and sets the pointer to the process table for
				3154	the guest.
				3155
				3156	struct kvm_ppc_mmuv3_cfg {
				3157	__u64 flags;
				3158	__u64 process_table;
				3159	};
				3160
				3161	There are two bits that can be set in flags; KVM_PPC_MMUV3_RADIX and
				3162	KVM_PPC_MMUV3_GTSE. KVM_PPC_MMUV3_RADIX, if set, configures the guest
				3163	to use radix tree translation, and if clear, to use HPT translation.
				3164	KVM_PPC_MMUV3_GTSE, if set and if KVM permits it, configures the guest
				3165	to be able to use the global TLB and SLB invalidation instructions;
				3166	if clear, the guest may not use these instructions.
				3167
				3168	The process_table field specifies the address and size of the guest
				3169	process table, which is in the guest's space. This field is formatted
				3170	as the second doubleword of the partition table entry, as defined in
				3171	the Power ISA V3.00, Book III section 5.7.6.1.
				3172
				3173	4.101 KVM_PPC_GET_RMMU_INFO
				3174
				3175	Capability: KVM_CAP_PPC_RADIX_MMU
				3176	Architectures: ppc
				3177	Type: vm ioctl
				3178	Parameters: struct kvm_ppc_rmmu_info (out)
				3179	Returns: 0 on success,
				3180	-EFAULT if struct kvm_ppc_rmmu_info cannot be written,
				3181	-EINVAL if no useful information can be returned
				3182
				3183	This ioctl returns a structure containing two things: (a) a list
				3184	containing supported radix tree geometries, and (b) a list that maps
				3185	page sizes to put in the "AP" (actual page size) field for the tlbie
				3186	(TLB invalidate entry) instruction.
				3187
				3188	struct kvm_ppc_rmmu_info {
				3189	struct kvm_ppc_radix_geom {
				3190	__u8 page_shift;
				3191	__u8 level_bits[4];
				3192	__u8 pad[3];
				3193	} geometries[8];
				3194	__u32 ap_encodings[8];
				3195	};
				3196
				3197	The geometries[] field gives up to 8 supported geometries for the
				3198	radix page table, in terms of the log base 2 of the smallest page
				3199	size, and the number of bits indexed at each level of the tree, from
				3200	the PTE level up to the PGD level in that order. Any unused entries
				3201	will have 0 in the page_shift field.
				3202
				3203	The ap_encodings gives the supported page sizes and their AP field
				3204	encodings, encoded with the AP value in the top 3 bits and the log
				3205	base 2 of the page size in the bottom 6 bits.
				3206
				3207	4.102 KVM_PPC_RESIZE_HPT_PREPARE
				3208
				3209	Capability: KVM_CAP_SPAPR_RESIZE_HPT
				3210	Architectures: powerpc
				3211	Type: vm ioctl
				3212	Parameters: struct kvm_ppc_resize_hpt (in)
				3213	Returns: 0 on successful completion,
				3214	>0 if a new HPT is being prepared, the value is an estimated
				3215	number of milliseconds until preparation is complete
				3216	-EFAULT if struct kvm_reinject_control cannot be read,
				3217	-EINVAL if the supplied shift or flags are invalid
				3218	-ENOMEM if unable to allocate the new HPT
				3219	-ENOSPC if there was a hash collision when moving existing
				3220	HPT entries to the new HPT
				3221	-EIO on other error conditions
				3222
				3223	Used to implement the PAPR extension for runtime resizing of a guest's
				3224	Hashed Page Table (HPT). Specifically this starts, stops or monitors
				3225	the preparation of a new potential HPT for the guest, essentially
				3226	implementing the H_RESIZE_HPT_PREPARE hypercall.
				3227
				3228	If called with shift > 0 when there is no pending HPT for the guest,
				3229	this begins preparation of a new pending HPT of size 2^(shift) bytes.
				3230	It then returns a positive integer with the estimated number of
				3231	milliseconds until preparation is complete.
				3232
				3233	If called when there is a pending HPT whose size does not match that
				3234	requested in the parameters, discards the existing pending HPT and
				3235	creates a new one as above.
				3236
				3237	If called when there is a pending HPT of the size requested, will:
				3238	* If preparation of the pending HPT is already complete, return 0
				3239	* If preparation of the pending HPT has failed, return an error
				3240	code, then discard the pending HPT.
				3241	* If preparation of the pending HPT is still in progress, return an
				3242	estimated number of milliseconds until preparation is complete.
				3243
				3244	If called with shift == 0, discards any currently pending HPT and
				3245	returns 0 (i.e. cancels any in-progress preparation).
				3246
				3247	flags is reserved for future expansion, currently setting any bits in
				3248	flags will result in an -EINVAL.
				3249
				3250	Normally this will be called repeatedly with the same parameters until
				3251	it returns <= 0. The first call will initiate preparation, subsequent
				3252	ones will monitor preparation until it completes or fails.
				3253
				3254	struct kvm_ppc_resize_hpt {
				3255	__u64 flags;
				3256	__u32 shift;
				3257	__u32 pad;
				3258	};
				3259
				3260	4.103 KVM_PPC_RESIZE_HPT_COMMIT
				3261
				3262	Capability: KVM_CAP_SPAPR_RESIZE_HPT
				3263	Architectures: powerpc
				3264	Type: vm ioctl
				3265	Parameters: struct kvm_ppc_resize_hpt (in)
				3266	Returns: 0 on successful completion,
				3267	-EFAULT if struct kvm_reinject_control cannot be read,
				3268	-EINVAL if the supplied shift or flags are invalid
				3269	-ENXIO is there is no pending HPT, or the pending HPT doesn't
				3270	have the requested size
				3271	-EBUSY if the pending HPT is not fully prepared
				3272	-ENOSPC if there was a hash collision when moving existing
				3273	HPT entries to the new HPT
				3274	-EIO on other error conditions
				3275
				3276	Used to implement the PAPR extension for runtime resizing of a guest's
				3277	Hashed Page Table (HPT). Specifically this requests that the guest be
				3278	transferred to working with the new HPT, essentially implementing the
				3279	H_RESIZE_HPT_COMMIT hypercall.
				3280
				3281	This should only be called after KVM_PPC_RESIZE_HPT_PREPARE has
				3282	returned 0 with the same parameters. In other cases
				3283	KVM_PPC_RESIZE_HPT_COMMIT will return an error (usually -ENXIO or
				3284	-EBUSY, though others may be possible if the preparation was started,
				3285	but failed).
				3286
				3287	This will have undefined effects on the guest if it has not already
				3288	placed itself in a quiescent state where no vcpu will make MMU enabled
				3289	memory accesses.
				3290
				3291	On succsful completion, the pending HPT will become the guest's active
				3292	HPT and the previous HPT will be discarded.
				3293
				3294	On failure, the guest will still be operating on its previous HPT.
				3295
				3296	struct kvm_ppc_resize_hpt {
				3297	__u64 flags;
				3298	__u32 shift;
				3299	__u32 pad;
				3300	};
				3301
				3302	4.104 KVM_X86_GET_MCE_CAP_SUPPORTED
				3303
				3304	Capability: KVM_CAP_MCE
				3305	Architectures: x86
				3306	Type: system ioctl
				3307	Parameters: u64 mce_cap (out)
				3308	Returns: 0 on success, -1 on error
				3309
				3310	Returns supported MCE capabilities. The u64 mce_cap parameter
				3311	has the same format as the MSR_IA32_MCG_CAP register. Supported
				3312	capabilities will have the corresponding bits set.
				3313
				3314	4.105 KVM_X86_SETUP_MCE
				3315
				3316	Capability: KVM_CAP_MCE
				3317	Architectures: x86
				3318	Type: vcpu ioctl
				3319	Parameters: u64 mcg_cap (in)
				3320	Returns: 0 on success,
				3321	-EFAULT if u64 mcg_cap cannot be read,
				3322	-EINVAL if the requested number of banks is invalid,
				3323	-EINVAL if requested MCE capability is not supported.
				3324
				3325	Initializes MCE support for use. The u64 mcg_cap parameter
				3326	has the same format as the MSR_IA32_MCG_CAP register and
				3327	specifies which capabilities should be enabled. The maximum
				3328	supported number of error-reporting banks can be retrieved when
				3329	checking for KVM_CAP_MCE. The supported capabilities can be
				3330	retrieved with KVM_X86_GET_MCE_CAP_SUPPORTED.
				3331
				3332	4.106 KVM_X86_SET_MCE
				3333
				3334	Capability: KVM_CAP_MCE
				3335	Architectures: x86
				3336	Type: vcpu ioctl
				3337	Parameters: struct kvm_x86_mce (in)
				3338	Returns: 0 on success,
				3339	-EFAULT if struct kvm_x86_mce cannot be read,
				3340	-EINVAL if the bank number is invalid,
				3341	-EINVAL if VAL bit is not set in status field.
				3342
				3343	Inject a machine check error (MCE) into the guest. The input
				3344	parameter is:
				3345
				3346	struct kvm_x86_mce {
				3347	__u64 status;
				3348	__u64 addr;
				3349	__u64 misc;
				3350	__u64 mcg_status;
				3351	__u8 bank;
				3352	__u8 pad1[7];
				3353	__u64 pad2[3];
				3354	};
				3355
				3356	If the MCE being reported is an uncorrected error, KVM will
				3357	inject it as an MCE exception into the guest. If the guest
				3358	MCG_STATUS register reports that an MCE is in progress, KVM
				3359	causes an KVM_EXIT_SHUTDOWN vmexit.
				3360
				3361	Otherwise, if the MCE is a corrected error, KVM will just
				3362	store it in the corresponding bank (provided this bank is
				3363	not holding a previously reported uncorrected error).
				3364
				3365	4.107 KVM_S390_GET_CMMA_BITS
				3366
				3367	Capability: KVM_CAP_S390_CMMA_MIGRATION
				3368	Architectures: s390
				3369	Type: vm ioctl
				3370	Parameters: struct kvm_s390_cmma_log (in, out)
				3371	Returns: 0 on success, a negative value on error
				3372
				3373	This ioctl is used to get the values of the CMMA bits on the s390
				3374	architecture. It is meant to be used in two scenarios:
				3375	- During live migration to save the CMMA values. Live migration needs
				3376	to be enabled via the KVM_REQ_START_MIGRATION VM property.
				3377	- To non-destructively peek at the CMMA values, with the flag
				3378	KVM_S390_CMMA_PEEK set.
				3379
				3380	The ioctl takes parameters via the kvm_s390_cmma_log struct. The desired
				3381	values are written to a buffer whose location is indicated via the "values"
				3382	member in the kvm_s390_cmma_log struct. The values in the input struct are
				3383	also updated as needed.
				3384	Each CMMA value takes up one byte.
				3385
				3386	struct kvm_s390_cmma_log {
				3387	__u64 start_gfn;
				3388	__u32 count;
				3389	__u32 flags;
				3390	union {
				3391	__u64 remaining;
				3392	__u64 mask;
				3393	};
				3394	__u64 values;
				3395	};
				3396
				3397	start_gfn is the number of the first guest frame whose CMMA values are
				3398	to be retrieved,
				3399
				3400	count is the length of the buffer in bytes,
				3401
				3402	values points to the buffer where the result will be written to.
				3403
				3404	If count is greater than KVM_S390_SKEYS_MAX, then it is considered to be
				3405	KVM_S390_SKEYS_MAX. KVM_S390_SKEYS_MAX is re-used for consistency with
				3406	other ioctls.
				3407
				3408	The result is written in the buffer pointed to by the field values, and
				3409	the values of the input parameter are updated as follows.
				3410
				3411	Depending on the flags, different actions are performed. The only
				3412	supported flag so far is KVM_S390_CMMA_PEEK.
				3413
				3414	The default behaviour if KVM_S390_CMMA_PEEK is not set is:
				3415	start_gfn will indicate the first page frame whose CMMA bits were dirty.
				3416	It is not necessarily the same as the one passed as input, as clean pages
				3417	are skipped.
				3418
				3419	count will indicate the number of bytes actually written in the buffer.
				3420	It can (and very often will) be smaller than the input value, since the
				3421	buffer is only filled until 16 bytes of clean values are found (which
				3422	are then not copied in the buffer). Since a CMMA migration block needs
				3423	the base address and the length, for a total of 16 bytes, we will send
				3424	back some clean data if there is some dirty data afterwards, as long as
				3425	the size of the clean data does not exceed the size of the header. This
				3426	allows to minimize the amount of data to be saved or transferred over
				3427	the network at the expense of more roundtrips to userspace. The next
				3428	invocation of the ioctl will skip over all the clean values, saving
				3429	potentially more than just the 16 bytes we found.
				3430
				3431	If KVM_S390_CMMA_PEEK is set:
				3432	the existing storage attributes are read even when not in migration
				3433	mode, and no other action is performed;
				3434
				3435	the output start_gfn will be equal to the input start_gfn,
				3436
				3437	the output count will be equal to the input count, except if the end of
				3438	memory has been reached.
				3439
				3440	In both cases:
				3441	the field "remaining" will indicate the total number of dirty CMMA values
				3442	still remaining, or 0 if KVM_S390_CMMA_PEEK is set and migration mode is
				3443	not enabled.
				3444
				3445	mask is unused.
				3446
				3447	values points to the userspace buffer where the result will be stored.
				3448
				3449	This ioctl can fail with -ENOMEM if not enough memory can be allocated to
				3450	complete the task, with -ENXIO if CMMA is not enabled, with -EINVAL if
				3451	KVM_S390_CMMA_PEEK is not set but migration mode was not enabled, with
				3452	-EFAULT if the userspace address is invalid or if no page table is
				3453	present for the addresses (e.g. when using hugepages).
				3454
				3455	4.108 KVM_S390_SET_CMMA_BITS
				3456
				3457	Capability: KVM_CAP_S390_CMMA_MIGRATION
				3458	Architectures: s390
				3459	Type: vm ioctl
				3460	Parameters: struct kvm_s390_cmma_log (in)
				3461	Returns: 0 on success, a negative value on error
				3462
				3463	This ioctl is used to set the values of the CMMA bits on the s390
				3464	architecture. It is meant to be used during live migration to restore
				3465	the CMMA values, but there are no restrictions on its use.
				3466	The ioctl takes parameters via the kvm_s390_cmma_values struct.
				3467	Each CMMA value takes up one byte.
				3468
				3469	struct kvm_s390_cmma_log {
				3470	__u64 start_gfn;
				3471	__u32 count;
				3472	__u32 flags;
				3473	union {
				3474	__u64 remaining;
				3475	__u64 mask;
				3476	};
				3477	__u64 values;
				3478	};
				3479
				3480	start_gfn indicates the starting guest frame number,
				3481
				3482	count indicates how many values are to be considered in the buffer,
				3483
				3484	flags is not used and must be 0.
				3485
				3486	mask indicates which PGSTE bits are to be considered.
				3487
				3488	remaining is not used.
				3489
				3490	values points to the buffer in userspace where to store the values.
				3491
				3492	This ioctl can fail with -ENOMEM if not enough memory can be allocated to
				3493	complete the task, with -ENXIO if CMMA is not enabled, with -EINVAL if
				3494	the count field is too large (e.g. more than KVM_S390_CMMA_SIZE_MAX) or
				3495	if the flags field was not 0, with -EFAULT if the userspace address is
				3496	invalid, if invalid pages are written to (e.g. after the end of memory)
				3497	or if no page table is present for the addresses (e.g. when using
				3498	hugepages).
				3499
				3500	4.109 KVM_PPC_GET_CPU_CHAR
				3501
				3502	Capability: KVM_CAP_PPC_GET_CPU_CHAR
				3503	Architectures: powerpc
				3504	Type: vm ioctl
				3505	Parameters: struct kvm_ppc_cpu_char (out)
				3506	Returns: 0 on successful completion
				3507	-EFAULT if struct kvm_ppc_cpu_char cannot be written
				3508
				3509	This ioctl gives userspace information about certain characteristics
				3510	of the CPU relating to speculative execution of instructions and
				3511	possible information leakage resulting from speculative execution (see
				3512	CVE-2017-5715, CVE-2017-5753 and CVE-2017-5754). The information is
				3513	returned in struct kvm_ppc_cpu_char, which looks like this:
				3514
				3515	struct kvm_ppc_cpu_char {
				3516	__u64 character; /* characteristics of the CPU */
				3517	__u64 behaviour; /* recommended software behaviour */
				3518	__u64 character_mask; /* valid bits in character */
				3519	__u64 behaviour_mask; /* valid bits in behaviour */
				3520	};
				3521
				3522	For extensibility, the character_mask and behaviour_mask fields
				3523	indicate which bits of character and behaviour have been filled in by
				3524	the kernel. If the set of defined bits is extended in future then
				3525	userspace will be able to tell whether it is running on a kernel that
				3526	knows about the new bits.
				3527
				3528	The character field describes attributes of the CPU which can help
				3529	with preventing inadvertent information disclosure - specifically,
				3530	whether there is an instruction to flash-invalidate the L1 data cache
				3531	(ori 30,30,0 or mtspr SPRN_TRIG2,rN), whether the L1 data cache is set
				3532	to a mode where entries can only be used by the thread that created
				3533	them, whether the bcctr[l] instruction prevents speculation, and
				3534	whether a speculation barrier instruction (ori 31,31,0) is provided.
				3535
				3536	The behaviour field describes actions that software should take to
				3537	prevent inadvertent information disclosure, and thus describes which
				3538	vulnerabilities the hardware is subject to; specifically whether the
				3539	L1 data cache should be flushed when returning to user mode from the
				3540	kernel, and whether a speculation barrier should be placed between an
				3541	array bounds check and the array access.
				3542
				3543	These fields use the same bit definitions as the new
				3544	H_GET_CPU_CHARACTERISTICS hypercall.
				3545
				3546	4.110 KVM_MEMORY_ENCRYPT_OP
				3547
				3548	Capability: basic
				3549	Architectures: x86
				3550	Type: system
				3551	Parameters: an opaque platform specific structure (in/out)
				3552	Returns: 0 on success; -1 on error
				3553
				3554	If the platform supports creating encrypted VMs then this ioctl can be used
				3555	for issuing platform-specific memory encryption commands to manage those
				3556	encrypted VMs.
				3557
				3558	Currently, this ioctl is used for issuing Secure Encrypted Virtualization
				3559	(SEV) commands on AMD Processors. The SEV commands are defined in
				3560	Documentation/virtual/kvm/amd-memory-encryption.rst.
				3561
				3562	4.111 KVM_MEMORY_ENCRYPT_REG_REGION
				3563
				3564	Capability: basic
				3565	Architectures: x86
				3566	Type: system
				3567	Parameters: struct kvm_enc_region (in)
				3568	Returns: 0 on success; -1 on error
				3569
				3570	This ioctl can be used to register a guest memory region which may
				3571	contain encrypted data (e.g. guest RAM, SMRAM etc).
				3572
				3573	It is used in the SEV-enabled guest. When encryption is enabled, a guest
				3574	memory region may contain encrypted data. The SEV memory encryption
				3575	engine uses a tweak such that two identical plaintext pages, each at
				3576	different locations will have differing ciphertexts. So swapping or
				3577	moving ciphertext of those pages will not result in plaintext being
				3578	swapped. So relocating (or migrating) physical backing pages for the SEV
				3579	guest will require some additional steps.
				3580
				3581	Note: The current SEV key management spec does not provide commands to
				3582	swap or migrate (move) ciphertext pages. Hence, for now we pin the guest
				3583	memory region registered with the ioctl.
				3584
				3585	4.112 KVM_MEMORY_ENCRYPT_UNREG_REGION
				3586
				3587	Capability: basic
				3588	Architectures: x86
				3589	Type: system
				3590	Parameters: struct kvm_enc_region (in)
				3591	Returns: 0 on success; -1 on error
				3592
				3593	This ioctl can be used to unregister the guest memory region registered
				3594	with KVM_MEMORY_ENCRYPT_REG_REGION ioctl above.
				3595
				3596	4.113 KVM_HYPERV_EVENTFD
				3597
				3598	Capability: KVM_CAP_HYPERV_EVENTFD
				3599	Architectures: x86
				3600	Type: vm ioctl
				3601	Parameters: struct kvm_hyperv_eventfd (in)
				3602
				3603	This ioctl (un)registers an eventfd to receive notifications from the guest on
				3604	the specified Hyper-V connection id through the SIGNAL_EVENT hypercall, without
				3605	causing a user exit. SIGNAL_EVENT hypercall with non-zero event flag number
				3606	(bits 24-31) still triggers a KVM_EXIT_HYPERV_HCALL user exit.
				3607
				3608	struct kvm_hyperv_eventfd {
				3609	__u32 conn_id;
				3610	__s32 fd;
				3611	__u32 flags;
				3612	__u32 padding[3];
				3613	};
				3614
				3615	The conn_id field should fit within 24 bits:
				3616
				3617	#define KVM_HYPERV_CONN_ID_MASK 0x00ffffff
				3618
				3619	The acceptable values for the flags field are:
				3620
				3621	#define KVM_HYPERV_EVENTFD_DEASSIGN (1 << 0)
				3622
				3623	Returns: 0 on success,
				3624	-EINVAL if conn_id or flags is outside the allowed range
				3625	-ENOENT on deassign if the conn_id isn't registered
				3626	-EEXIST on assign if the conn_id is already registered
				3627
				3628	4.114 KVM_GET_NESTED_STATE
				3629
				3630	Capability: KVM_CAP_NESTED_STATE
				3631	Architectures: x86
				3632	Type: vcpu ioctl
				3633	Parameters: struct kvm_nested_state (in/out)
				3634	Returns: 0 on success, -1 on error
				3635	Errors:
				3636	E2BIG: the total state size (including the fixed-size part of struct
				3637	kvm_nested_state) exceeds the value of 'size' specified by
				3638	the user; the size required will be written into size.
				3639
				3640	struct kvm_nested_state {
				3641	__u16 flags;
				3642	__u16 format;
				3643	__u32 size;
				3644	union {
				3645	struct kvm_vmx_nested_state vmx;
				3646	struct kvm_svm_nested_state svm;
				3647	__u8 pad[120];
				3648	};
				3649	__u8 data[0];
				3650	};
				3651
				3652	#define KVM_STATE_NESTED_GUEST_MODE 0x00000001
				3653	#define KVM_STATE_NESTED_RUN_PENDING 0x00000002
				3654
				3655	#define KVM_STATE_NESTED_SMM_GUEST_MODE 0x00000001
				3656	#define KVM_STATE_NESTED_SMM_VMXON 0x00000002
				3657
				3658	struct kvm_vmx_nested_state {
				3659	__u64 vmxon_pa;
				3660	__u64 vmcs_pa;
				3661
				3662	struct {
				3663	__u16 flags;
				3664	} smm;
				3665	};
				3666
				3667	This ioctl copies the vcpu's nested virtualization state from the kernel to
				3668	userspace.
				3669
				3670	The maximum size of the state, including the fixed-size part of struct
				3671	kvm_nested_state, can be retrieved by passing KVM_CAP_NESTED_STATE to
				3672	the KVM_CHECK_EXTENSION ioctl().
				3673
				3674	4.115 KVM_SET_NESTED_STATE
				3675
				3676	Capability: KVM_CAP_NESTED_STATE
				3677	Architectures: x86
				3678	Type: vcpu ioctl
				3679	Parameters: struct kvm_nested_state (in)
				3680	Returns: 0 on success, -1 on error
				3681
				3682	This copies the vcpu's kvm_nested_state struct from userspace to the kernel. For
				3683	the definition of struct kvm_nested_state, see KVM_GET_NESTED_STATE.
				3684
				3685	5. The kvm_run structure
				3686	------------------------
				3687
				3688	Application code obtains a pointer to the kvm_run structure by
				3689	mmap()ing a vcpu fd. From that point, application code can control
				3690	execution by changing fields in kvm_run prior to calling the KVM_RUN
				3691	ioctl, and obtain information about the reason KVM_RUN returned by
				3692	looking up structure members.
				3693
				3694	struct kvm_run {
				3695	/* in */
				3696	__u8 request_interrupt_window;
				3697
				3698	Request that KVM_RUN return when it becomes possible to inject external
				3699	interrupts into the guest. Useful in conjunction with KVM_INTERRUPT.
				3700
				3701	__u8 immediate_exit;
				3702
				3703	This field is polled once when KVM_RUN starts; if non-zero, KVM_RUN
				3704	exits immediately, returning -EINTR. In the common scenario where a
				3705	signal is used to "kick" a VCPU out of KVM_RUN, this field can be used
				3706	to avoid usage of KVM_SET_SIGNAL_MASK, which has worse scalability.
				3707	Rather than blocking the signal outside KVM_RUN, userspace can set up
				3708	a signal handler that sets run->immediate_exit to a non-zero value.
				3709
				3710	This field is ignored if KVM_CAP_IMMEDIATE_EXIT is not available.
				3711
				3712	__u8 padding1[6];
				3713
				3714	/* out */
				3715	__u32 exit_reason;
				3716
				3717	When KVM_RUN has returned successfully (return value 0), this informs
				3718	application code why KVM_RUN has returned. Allowable values for this
				3719	field are detailed below.
				3720
				3721	__u8 ready_for_interrupt_injection;
				3722
				3723	If request_interrupt_window has been specified, this field indicates
				3724	an interrupt can be injected now with KVM_INTERRUPT.
				3725
				3726	__u8 if_flag;
				3727
				3728	The value of the current interrupt flag. Only valid if in-kernel
				3729	local APIC is not used.
				3730
				3731	__u16 flags;
				3732
				3733	More architecture-specific flags detailing state of the VCPU that may
				3734	affect the device's behavior. The only currently defined flag is
				3735	KVM_RUN_X86_SMM, which is valid on x86 machines and is set if the
				3736	VCPU is in system management mode.
				3737
				3738	/* in (pre_kvm_run), out (post_kvm_run) */
				3739	__u64 cr8;
				3740
				3741	The value of the cr8 register. Only valid if in-kernel local APIC is
				3742	not used. Both input and output.
				3743
				3744	__u64 apic_base;
				3745
				3746	The value of the APIC BASE msr. Only valid if in-kernel local
				3747	APIC is not used. Both input and output.
				3748
				3749	union {
				3750	/* KVM_EXIT_UNKNOWN */
				3751	struct {
				3752	__u64 hardware_exit_reason;
				3753	} hw;
				3754
				3755	If exit_reason is KVM_EXIT_UNKNOWN, the vcpu has exited due to unknown
				3756	reasons. Further architecture-specific information is available in
				3757	hardware_exit_reason.
				3758
				3759	/* KVM_EXIT_FAIL_ENTRY */
				3760	struct {
				3761	__u64 hardware_entry_failure_reason;
				3762	} fail_entry;
				3763
				3764	If exit_reason is KVM_EXIT_FAIL_ENTRY, the vcpu could not be run due
				3765	to unknown reasons. Further architecture-specific information is
				3766	available in hardware_entry_failure_reason.
				3767
				3768	/* KVM_EXIT_EXCEPTION */
				3769	struct {
				3770	__u32 exception;
				3771	__u32 error_code;
				3772	} ex;
				3773
				3774	Unused.
				3775
				3776	/* KVM_EXIT_IO */
				3777	struct {
				3778	#define KVM_EXIT_IO_IN 0
				3779	#define KVM_EXIT_IO_OUT 1
				3780	__u8 direction;
				3781	__u8 size; /* bytes */
				3782	__u16 port;
				3783	__u32 count;
				3784	__u64 data_offset; /* relative to kvm_run start */
				3785	} io;
				3786
				3787	If exit_reason is KVM_EXIT_IO, then the vcpu has
				3788	executed a port I/O instruction which could not be satisfied by kvm.
				3789	data_offset describes where the data is located (KVM_EXIT_IO_OUT) or
				3790	where kvm expects application code to place the data for the next
				3791	KVM_RUN invocation (KVM_EXIT_IO_IN). Data format is a packed array.
				3792
				3793	/* KVM_EXIT_DEBUG */
				3794	struct {
				3795	struct kvm_debug_exit_arch arch;
				3796	} debug;
				3797
				3798	If the exit_reason is KVM_EXIT_DEBUG, then a vcpu is processing a debug event
				3799	for which architecture specific information is returned.
				3800
				3801	/* KVM_EXIT_MMIO */
				3802	struct {
				3803	__u64 phys_addr;
				3804	__u8 data[8];
				3805	__u32 len;
				3806	__u8 is_write;
				3807	} mmio;
				3808
				3809	If exit_reason is KVM_EXIT_MMIO, then the vcpu has
				3810	executed a memory-mapped I/O instruction which could not be satisfied
				3811	by kvm. The 'data' member contains the written data if 'is_write' is
				3812	true, and should be filled by application code otherwise.
				3813
				3814	The 'data' member contains, in its first 'len' bytes, the value as it would
				3815	appear if the VCPU performed a load or store of the appropriate width directly
				3816	to the byte array.
				3817
				3818	NOTE: For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_PAPR and
				3819	KVM_EXIT_EPR the corresponding
				3820	operations are complete (and guest state is consistent) only after userspace
				3821	has re-entered the kernel with KVM_RUN. The kernel side will first finish
				3822	incomplete operations and then check for pending signals. Userspace
				3823	can re-enter the guest with an unmasked signal pending to complete
				3824	pending operations.
				3825
				3826	/* KVM_EXIT_HYPERCALL */
				3827	struct {
				3828	__u64 nr;
				3829	__u64 args[6];
				3830	__u64 ret;
				3831	__u32 longmode;
				3832	__u32 pad;
				3833	} hypercall;
				3834
				3835	Unused. This was once used for 'hypercall to userspace'. To implement
				3836	such functionality, use KVM_EXIT_IO (x86) or KVM_EXIT_MMIO (all except s390).
				3837	Note KVM_EXIT_IO is significantly faster than KVM_EXIT_MMIO.
				3838
				3839	/* KVM_EXIT_TPR_ACCESS */
				3840	struct {
				3841	__u64 rip;
				3842	__u32 is_write;
				3843	__u32 pad;
				3844	} tpr_access;
				3845
				3846	To be documented (KVM_TPR_ACCESS_REPORTING).
				3847
				3848	/* KVM_EXIT_S390_SIEIC */
				3849	struct {
				3850	__u8 icptcode;
				3851	__u64 mask; /* psw upper half */
				3852	__u64 addr; /* psw lower half */
				3853	__u16 ipa;
				3854	__u32 ipb;
				3855	} s390_sieic;
				3856
				3857	s390 specific.
				3858
				3859	/* KVM_EXIT_S390_RESET */
				3860	#define KVM_S390_RESET_POR 1
				3861	#define KVM_S390_RESET_CLEAR 2
				3862	#define KVM_S390_RESET_SUBSYSTEM 4
				3863	#define KVM_S390_RESET_CPU_INIT 8
				3864	#define KVM_S390_RESET_IPL 16
				3865	__u64 s390_reset_flags;
				3866
				3867	s390 specific.
				3868
				3869	/* KVM_EXIT_S390_UCONTROL */
				3870	struct {
				3871	__u64 trans_exc_code;
				3872	__u32 pgm_code;
				3873	} s390_ucontrol;
				3874
				3875	s390 specific. A page fault has occurred for a user controlled virtual
				3876	machine (KVM_VM_S390_UNCONTROL) on it's host page table that cannot be
				3877	resolved by the kernel.
				3878	The program code and the translation exception code that were placed
				3879	in the cpu's lowcore are presented here as defined by the z Architecture
				3880	Principles of Operation Book in the Chapter for Dynamic Address Translation
				3881	(DAT)
				3882
				3883	/* KVM_EXIT_DCR */
				3884	struct {
				3885	__u32 dcrn;
				3886	__u32 data;
				3887	__u8 is_write;
				3888	} dcr;
				3889
				3890	Deprecated - was used for 440 KVM.
				3891
				3892	/* KVM_EXIT_OSI */
				3893	struct {
				3894	__u64 gprs[32];
				3895	} osi;
				3896
				3897	MOL uses a special hypercall interface it calls 'OSI'. To enable it, we catch
				3898	hypercalls and exit with this exit struct that contains all the guest gprs.
				3899
				3900	If exit_reason is KVM_EXIT_OSI, then the vcpu has triggered such a hypercall.
				3901	Userspace can now handle the hypercall and when it's done modify the gprs as
				3902	necessary. Upon guest entry all guest GPRs will then be replaced by the values
				3903	in this struct.
				3904
				3905	/* KVM_EXIT_PAPR_HCALL */
				3906	struct {
				3907	__u64 nr;
				3908	__u64 ret;
				3909	__u64 args[9];
				3910	} papr_hcall;
				3911
				3912	This is used on 64-bit PowerPC when emulating a pSeries partition,
				3913	e.g. with the 'pseries' machine type in qemu. It occurs when the
				3914	guest does a hypercall using the 'sc 1' instruction. The 'nr' field
				3915	contains the hypercall number (from the guest R3), and 'args' contains
				3916	the arguments (from the guest R4 - R12). Userspace should put the
				3917	return code in 'ret' and any extra returned values in args[].
				3918	The possible hypercalls are defined in the Power Architecture Platform
				3919	Requirements (PAPR) document available from www.power.org (free
				3920	developer registration required to access it).
				3921
				3922	/* KVM_EXIT_S390_TSCH */
				3923	struct {
				3924	__u16 subchannel_id;
				3925	__u16 subchannel_nr;
				3926	__u32 io_int_parm;
				3927	__u32 io_int_word;
				3928	__u32 ipb;
				3929	__u8 dequeued;
				3930	} s390_tsch;
				3931
				3932	s390 specific. This exit occurs when KVM_CAP_S390_CSS_SUPPORT has been enabled
				3933	and TEST SUBCHANNEL was intercepted. If dequeued is set, a pending I/O
				3934	interrupt for the target subchannel has been dequeued and subchannel_id,
				3935	subchannel_nr, io_int_parm and io_int_word contain the parameters for that
				3936	interrupt. ipb is needed for instruction parameter decoding.
				3937
				3938	/* KVM_EXIT_EPR */
				3939	struct {
				3940	__u32 epr;
				3941	} epr;
				3942
				3943	On FSL BookE PowerPC chips, the interrupt controller has a fast patch
				3944	interrupt acknowledge path to the core. When the core successfully
				3945	delivers an interrupt, it automatically populates the EPR register with
				3946	the interrupt vector number and acknowledges the interrupt inside
				3947	the interrupt controller.
				3948
				3949	In case the interrupt controller lives in user space, we need to do
				3950	the interrupt acknowledge cycle through it to fetch the next to be
				3951	delivered interrupt vector using this exit.
				3952
				3953	It gets triggered whenever both KVM_CAP_PPC_EPR are enabled and an
				3954	external interrupt has just been delivered into the guest. User space
				3955	should put the acknowledged interrupt vector into the 'epr' field.
				3956
				3957	/* KVM_EXIT_SYSTEM_EVENT */
				3958	struct {
				3959	#define KVM_SYSTEM_EVENT_SHUTDOWN 1
				3960	#define KVM_SYSTEM_EVENT_RESET 2
				3961	#define KVM_SYSTEM_EVENT_CRASH 3
				3962	__u32 type;
				3963	__u64 flags;
				3964	} system_event;
				3965
				3966	If exit_reason is KVM_EXIT_SYSTEM_EVENT then the vcpu has triggered
				3967	a system-level event using some architecture specific mechanism (hypercall
				3968	or some special instruction). In case of ARM/ARM64, this is triggered using
				3969	HVC instruction based PSCI call from the vcpu. The 'type' field describes
				3970	the system-level event type. The 'flags' field describes architecture
				3971	specific flags for the system-level event.
				3972
				3973	Valid values for 'type' are:
				3974	KVM_SYSTEM_EVENT_SHUTDOWN -- the guest has requested a shutdown of the
				3975	VM. Userspace is not obliged to honour this, and if it does honour
				3976	this does not need to destroy the VM synchronously (ie it may call
				3977	KVM_RUN again before shutdown finally occurs).
				3978	KVM_SYSTEM_EVENT_RESET -- the guest has requested a reset of the VM.
				3979	As with SHUTDOWN, userspace can choose to ignore the request, or
				3980	to schedule the reset to occur in the future and may call KVM_RUN again.
				3981	KVM_SYSTEM_EVENT_CRASH -- the guest crash occurred and the guest
				3982	has requested a crash condition maintenance. Userspace can choose
				3983	to ignore the request, or to gather VM memory core dump and/or
				3984	reset/shutdown of the VM.
				3985
				3986	/* KVM_EXIT_IOAPIC_EOI */
				3987	struct {
				3988	__u8 vector;
				3989	} eoi;
				3990
				3991	Indicates that the VCPU's in-kernel local APIC received an EOI for a
				3992	level-triggered IOAPIC interrupt. This exit only triggers when the
				3993	IOAPIC is implemented in userspace (i.e. KVM_CAP_SPLIT_IRQCHIP is enabled);
				3994	the userspace IOAPIC should process the EOI and retrigger the interrupt if
				3995	it is still asserted. Vector is the LAPIC interrupt vector for which the
				3996	EOI was received.
				3997
				3998	struct kvm_hyperv_exit {
				3999	#define KVM_EXIT_HYPERV_SYNIC 1
				4000	#define KVM_EXIT_HYPERV_HCALL 2
				4001	__u32 type;
				4002	union {
				4003	struct {
				4004	__u32 msr;
				4005	__u64 control;
				4006	__u64 evt_page;
				4007	__u64 msg_page;
				4008	} synic;
				4009	struct {
				4010	__u64 input;
				4011	__u64 result;
				4012	__u64 params[2];
				4013	} hcall;
				4014	} u;
				4015	};
				4016	/* KVM_EXIT_HYPERV */
				4017	struct kvm_hyperv_exit hyperv;
				4018	Indicates that the VCPU exits into userspace to process some tasks
				4019	related to Hyper-V emulation.
				4020	Valid values for 'type' are:
				4021	KVM_EXIT_HYPERV_SYNIC -- synchronously notify user-space about
				4022	Hyper-V SynIC state change. Notification is used to remap SynIC
				4023	event/message pages and to enable/disable SynIC messages/events processing
				4024	in userspace.
				4025
				4026	/* Fix the size of the union. */
				4027	char padding[256];
				4028	};
				4029
				4030	/*
				4031	* shared registers between kvm and userspace.
				4032	* kvm_valid_regs specifies the register classes set by the host
				4033	* kvm_dirty_regs specified the register classes dirtied by userspace
				4034	* struct kvm_sync_regs is architecture specific, as well as the
				4035	* bits for kvm_valid_regs and kvm_dirty_regs
				4036	*/
				4037	__u64 kvm_valid_regs;
				4038	__u64 kvm_dirty_regs;
				4039	union {
				4040	struct kvm_sync_regs regs;
				4041	char padding[SYNC_REGS_SIZE_BYTES];
				4042	} s;
				4043
				4044	If KVM_CAP_SYNC_REGS is defined, these fields allow userspace to access
				4045	certain guest registers without having to call SET/GET_*REGS. Thus we can
				4046	avoid some system call overhead if userspace has to handle the exit.
				4047	Userspace can query the validity of the structure by checking
				4048	kvm_valid_regs for specific bits. These bits are architecture specific
				4049	and usually define the validity of a groups of registers. (e.g. one bit
				4050	for general purpose registers)
				4051
				4052	Please note that the kernel is allowed to use the kvm_run structure as the
				4053	primary storage for certain register types. Therefore, the kernel may use the
				4054	values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set.
				4055
				4056	};
				4057
				4058
				4059
				4060	6. Capabilities that can be enabled on vCPUs
				4061	--------------------------------------------
				4062
				4063	There are certain capabilities that change the behavior of the virtual CPU or
				4064	the virtual machine when enabled. To enable them, please see section 4.37.
				4065	Below you can find a list of capabilities and what their effect on the vCPU or
				4066	the virtual machine is when enabling them.
				4067
				4068	The following information is provided along with the description:
				4069
				4070	Architectures: which instruction set architectures provide this ioctl.
				4071	x86 includes both i386 and x86_64.
				4072
				4073	Target: whether this is a per-vcpu or per-vm capability.
				4074
				4075	Parameters: what parameters are accepted by the capability.
				4076
				4077	Returns: the return value. General error numbers (EBADF, ENOMEM, EINVAL)
				4078	are not detailed, but errors with specific meanings are.
				4079
				4080
				4081	6.1 KVM_CAP_PPC_OSI
				4082
				4083	Architectures: ppc
				4084	Target: vcpu
				4085	Parameters: none
				4086	Returns: 0 on success; -1 on error
				4087
				4088	This capability enables interception of OSI hypercalls that otherwise would
				4089	be treated as normal system calls to be injected into the guest. OSI hypercalls
				4090	were invented by Mac-on-Linux to have a standardized communication mechanism
				4091	between the guest and the host.
				4092
				4093	When this capability is enabled, KVM_EXIT_OSI can occur.
				4094
				4095
				4096	6.2 KVM_CAP_PPC_PAPR
				4097
				4098	Architectures: ppc
				4099	Target: vcpu
				4100	Parameters: none
				4101	Returns: 0 on success; -1 on error
				4102
				4103	This capability enables interception of PAPR hypercalls. PAPR hypercalls are
				4104	done using the hypercall instruction "sc 1".
				4105
				4106	It also sets the guest privilege level to "supervisor" mode. Usually the guest
				4107	runs in "hypervisor" privilege mode with a few missing features.
				4108
				4109	In addition to the above, it changes the semantics of SDR1. In this mode, the
				4110	HTAB address part of SDR1 contains an HVA instead of a GPA, as PAPR keeps the
				4111	HTAB invisible to the guest.
				4112
				4113	When this capability is enabled, KVM_EXIT_PAPR_HCALL can occur.
				4114
				4115
				4116	6.3 KVM_CAP_SW_TLB
				4117
				4118	Architectures: ppc
				4119	Target: vcpu
				4120	Parameters: args[0] is the address of a struct kvm_config_tlb
				4121	Returns: 0 on success; -1 on error
				4122
				4123	struct kvm_config_tlb {
				4124	__u64 params;
				4125	__u64 array;
				4126	__u32 mmu_type;
				4127	__u32 array_len;
				4128	};
				4129
				4130	Configures the virtual CPU's TLB array, establishing a shared memory area
				4131	between userspace and KVM. The "params" and "array" fields are userspace
				4132	addresses of mmu-type-specific data structures. The "array_len" field is an
				4133	safety mechanism, and should be set to the size in bytes of the memory that
				4134	userspace has reserved for the array. It must be at least the size dictated
				4135	by "mmu_type" and "params".
				4136
				4137	While KVM_RUN is active, the shared region is under control of KVM. Its
				4138	contents are undefined, and any modification by userspace results in
				4139	boundedly undefined behavior.
				4140
				4141	On return from KVM_RUN, the shared region will reflect the current state of
				4142	the guest's TLB. If userspace makes any changes, it must call KVM_DIRTY_TLB
				4143	to tell KVM which entries have been changed, prior to calling KVM_RUN again
				4144	on this vcpu.
				4145
				4146	For mmu types KVM_MMU_FSL_BOOKE_NOHV and KVM_MMU_FSL_BOOKE_HV:
				4147	- The "params" field is of type "struct kvm_book3e_206_tlb_params".
				4148	- The "array" field points to an array of type "struct
				4149	kvm_book3e_206_tlb_entry".
				4150	- The array consists of all entries in the first TLB, followed by all
				4151	entries in the second TLB.
				4152	- Within a TLB, entries are ordered first by increasing set number. Within a
				4153	set, entries are ordered by way (increasing ESEL).
				4154	- The hash for determining set number in TLB0 is: (MAS2 >> 12) & (num_sets - 1)
				4155	where "num_sets" is the tlb_sizes[] value divided by the tlb_ways[] value.
				4156	- The tsize field of mas1 shall be set to 4K on TLB0, even though the
				4157	hardware ignores this value for TLB0.
				4158
				4159	6.4 KVM_CAP_S390_CSS_SUPPORT
				4160
				4161	Architectures: s390
				4162	Target: vcpu
				4163	Parameters: none
				4164	Returns: 0 on success; -1 on error
				4165
				4166	This capability enables support for handling of channel I/O instructions.
				4167
				4168	TEST PENDING INTERRUPTION and the interrupt portion of TEST SUBCHANNEL are
				4169	handled in-kernel, while the other I/O instructions are passed to userspace.
				4170
				4171	When this capability is enabled, KVM_EXIT_S390_TSCH will occur on TEST
				4172	SUBCHANNEL intercepts.
				4173
				4174	Note that even though this capability is enabled per-vcpu, the complete
				4175	virtual machine is affected.
				4176
				4177	6.5 KVM_CAP_PPC_EPR
				4178
				4179	Architectures: ppc
				4180	Target: vcpu
				4181	Parameters: args[0] defines whether the proxy facility is active
				4182	Returns: 0 on success; -1 on error
				4183
				4184	This capability enables or disables the delivery of interrupts through the
				4185	external proxy facility.
				4186
				4187	When enabled (args[0] != 0), every time the guest gets an external interrupt
				4188	delivered, it automatically exits into user space with a KVM_EXIT_EPR exit
				4189	to receive the topmost interrupt vector.
				4190
				4191	When disabled (args[0] == 0), behavior is as if this facility is unsupported.
				4192
				4193	When this capability is enabled, KVM_EXIT_EPR can occur.
				4194
				4195	6.6 KVM_CAP_IRQ_MPIC
				4196
				4197	Architectures: ppc
				4198	Parameters: args[0] is the MPIC device fd
				4199	args[1] is the MPIC CPU number for this vcpu
				4200
				4201	This capability connects the vcpu to an in-kernel MPIC device.
				4202
				4203	6.7 KVM_CAP_IRQ_XICS
				4204
				4205	Architectures: ppc
				4206	Target: vcpu
				4207	Parameters: args[0] is the XICS device fd
				4208	args[1] is the XICS CPU number (server ID) for this vcpu
				4209
				4210	This capability connects the vcpu to an in-kernel XICS device.
				4211
				4212	6.8 KVM_CAP_S390_IRQCHIP
				4213
				4214	Architectures: s390
				4215	Target: vm
				4216	Parameters: none
				4217
				4218	This capability enables the in-kernel irqchip for s390. Please refer to
				4219	"4.24 KVM_CREATE_IRQCHIP" for details.
				4220
				4221	6.9 KVM_CAP_MIPS_FPU
				4222
				4223	Architectures: mips
				4224	Target: vcpu
				4225	Parameters: args[0] is reserved for future use (should be 0).
				4226
				4227	This capability allows the use of the host Floating Point Unit by the guest. It
				4228	allows the Config1.FP bit to be set to enable the FPU in the guest. Once this is
				4229	done the KVM_REG_MIPS_FPR_* and KVM_REG_MIPS_FCR_* registers can be accessed
				4230	(depending on the current guest FPU register mode), and the Status.FR,
				4231	Config5.FRE bits are accessible via the KVM API and also from the guest,
				4232	depending on them being supported by the FPU.
				4233
				4234	6.10 KVM_CAP_MIPS_MSA
				4235
				4236	Architectures: mips
				4237	Target: vcpu
				4238	Parameters: args[0] is reserved for future use (should be 0).
				4239
				4240	This capability allows the use of the MIPS SIMD Architecture (MSA) by the guest.
				4241	It allows the Config3.MSAP bit to be set to enable the use of MSA by the guest.
				4242	Once this is done the KVM_REG_MIPS_VEC_* and KVM_REG_MIPS_MSA_* registers can be
				4243	accessed, and the Config5.MSAEn bit is accessible via the KVM API and also from
				4244	the guest.
				4245
				4246	6.74 KVM_CAP_SYNC_REGS
				4247	Architectures: s390, x86
				4248	Target: s390: always enabled, x86: vcpu
				4249	Parameters: none
				4250	Returns: x86: KVM_CHECK_EXTENSION returns a bit-array indicating which register
				4251	sets are supported (bitfields defined in arch/x86/include/uapi/asm/kvm.h).
				4252
				4253	As described above in the kvm_sync_regs struct info in section 5 (kvm_run):
				4254	KVM_CAP_SYNC_REGS "allow[s] userspace to access certain guest registers
				4255	without having to call SET/GET_*REGS". This reduces overhead by eliminating
				4256	repeated ioctl calls for setting and/or getting register values. This is
				4257	particularly important when userspace is making synchronous guest state
				4258	modifications, e.g. when emulating and/or intercepting instructions in
				4259	userspace.
				4260
				4261	For s390 specifics, please refer to the source code.
				4262
				4263	For x86:
				4264	- the register sets to be copied out to kvm_run are selectable
				4265	by userspace (rather that all sets being copied out for every exit).
				4266	- vcpu_events are available in addition to regs and sregs.
				4267
				4268	For x86, the 'kvm_valid_regs' field of struct kvm_run is overloaded to
				4269	function as an input bit-array field set by userspace to indicate the
				4270	specific register sets to be copied out on the next exit.
				4271
				4272	To indicate when userspace has modified values that should be copied into
				4273	the vCPU, the all architecture bitarray field, 'kvm_dirty_regs' must be set.
				4274	This is done using the same bitflags as for the 'kvm_valid_regs' field.
				4275	If the dirty bit is not set, then the register set values will not be copied
				4276	into the vCPU even if they've been modified.
				4277
				4278	Unused bitfields in the bitarrays must be set to zero.
				4279
				4280	struct kvm_sync_regs {
				4281	struct kvm_regs regs;
				4282	struct kvm_sregs sregs;
				4283	struct kvm_vcpu_events events;
				4284	};
				4285
				4286	7. Capabilities that can be enabled on VMs
				4287	------------------------------------------
				4288
				4289	There are certain capabilities that change the behavior of the virtual
				4290	machine when enabled. To enable them, please see section 4.37. Below
				4291	you can find a list of capabilities and what their effect on the VM
				4292	is when enabling them.
				4293
				4294	The following information is provided along with the description:
				4295
				4296	Architectures: which instruction set architectures provide this ioctl.
				4297	x86 includes both i386 and x86_64.
				4298
				4299	Parameters: what parameters are accepted by the capability.
				4300
				4301	Returns: the return value. General error numbers (EBADF, ENOMEM, EINVAL)
				4302	are not detailed, but errors with specific meanings are.
				4303
				4304
				4305	7.1 KVM_CAP_PPC_ENABLE_HCALL
				4306
				4307	Architectures: ppc
				4308	Parameters: args[0] is the sPAPR hcall number
				4309	args[1] is 0 to disable, 1 to enable in-kernel handling
				4310
				4311	This capability controls whether individual sPAPR hypercalls (hcalls)
				4312	get handled by the kernel or not. Enabling or disabling in-kernel
				4313	handling of an hcall is effective across the VM. On creation, an
				4314	initial set of hcalls are enabled for in-kernel handling, which
				4315	consists of those hcalls for which in-kernel handlers were implemented
				4316	before this capability was implemented. If disabled, the kernel will
				4317	not to attempt to handle the hcall, but will always exit to userspace
				4318	to handle it. Note that it may not make sense to enable some and
				4319	disable others of a group of related hcalls, but KVM does not prevent
				4320	userspace from doing that.
				4321
				4322	If the hcall number specified is not one that has an in-kernel
				4323	implementation, the KVM_ENABLE_CAP ioctl will fail with an EINVAL
				4324	error.
				4325
				4326	7.2 KVM_CAP_S390_USER_SIGP
				4327
				4328	Architectures: s390
				4329	Parameters: none
				4330
				4331	This capability controls which SIGP orders will be handled completely in user
				4332	space. With this capability enabled, all fast orders will be handled completely
				4333	in the kernel:
				4334	- SENSE
				4335	- SENSE RUNNING
				4336	- EXTERNAL CALL
				4337	- EMERGENCY SIGNAL
				4338	- CONDITIONAL EMERGENCY SIGNAL
				4339
				4340	All other orders will be handled completely in user space.
				4341
				4342	Only privileged operation exceptions will be checked for in the kernel (or even
				4343	in the hardware prior to interception). If this capability is not enabled, the
				4344	old way of handling SIGP orders is used (partially in kernel and user space).
				4345
				4346	7.3 KVM_CAP_S390_VECTOR_REGISTERS
				4347
				4348	Architectures: s390
				4349	Parameters: none
				4350	Returns: 0 on success, negative value on error
				4351
				4352	Allows use of the vector registers introduced with z13 processor, and
				4353	provides for the synchronization between host and user space. Will
				4354	return -EINVAL if the machine does not support vectors.
				4355
				4356	7.4 KVM_CAP_S390_USER_STSI
				4357
				4358	Architectures: s390
				4359	Parameters: none
				4360
				4361	This capability allows post-handlers for the STSI instruction. After
				4362	initial handling in the kernel, KVM exits to user space with
				4363	KVM_EXIT_S390_STSI to allow user space to insert further data.
				4364
				4365	Before exiting to userspace, kvm handlers should fill in s390_stsi field of
				4366	vcpu->run:
				4367	struct {
				4368	__u64 addr;
				4369	__u8 ar;
				4370	__u8 reserved;
				4371	__u8 fc;
				4372	__u8 sel1;
				4373	__u16 sel2;
				4374	} s390_stsi;
				4375
				4376	@addr - guest address of STSI SYSIB
				4377	@fc - function code
				4378	@sel1 - selector 1
				4379	@sel2 - selector 2
				4380	@ar - access register number
				4381
				4382	KVM handlers should exit to userspace with rc = -EREMOTE.
				4383
				4384	7.5 KVM_CAP_SPLIT_IRQCHIP
				4385
				4386	Architectures: x86
				4387	Parameters: args[0] - number of routes reserved for userspace IOAPICs
				4388	Returns: 0 on success, -1 on error
				4389
				4390	Create a local apic for each processor in the kernel. This can be used
				4391	instead of KVM_CREATE_IRQCHIP if the userspace VMM wishes to emulate the
				4392	IOAPIC and PIC (and also the PIT, even though this has to be enabled
				4393	separately).
				4394
				4395	This capability also enables in kernel routing of interrupt requests;
				4396	when KVM_CAP_SPLIT_IRQCHIP only routes of KVM_IRQ_ROUTING_MSI type are
				4397	used in the IRQ routing table. The first args[0] MSI routes are reserved
				4398	for the IOAPIC pins. Whenever the LAPIC receives an EOI for these routes,
				4399	a KVM_EXIT_IOAPIC_EOI vmexit will be reported to userspace.
				4400
				4401	Fails if VCPU has already been created, or if the irqchip is already in the
				4402	kernel (i.e. KVM_CREATE_IRQCHIP has already been called).
				4403
				4404	7.6 KVM_CAP_S390_RI
				4405
				4406	Architectures: s390
				4407	Parameters: none
				4408
				4409	Allows use of runtime-instrumentation introduced with zEC12 processor.
				4410	Will return -EINVAL if the machine does not support runtime-instrumentation.
				4411	Will return -EBUSY if a VCPU has already been created.
				4412
				4413	7.7 KVM_CAP_X2APIC_API
				4414
				4415	Architectures: x86
				4416	Parameters: args[0] - features that should be enabled
				4417	Returns: 0 on success, -EINVAL when args[0] contains invalid features
				4418
				4419	Valid feature flags in args[0] are
				4420
				4421	#define KVM_X2APIC_API_USE_32BIT_IDS (1ULL << 0)
				4422	#define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK (1ULL << 1)
				4423
				4424	Enabling KVM_X2APIC_API_USE_32BIT_IDS changes the behavior of
				4425	KVM_SET_GSI_ROUTING, KVM_SIGNAL_MSI, KVM_SET_LAPIC, and KVM_GET_LAPIC,
				4426	allowing the use of 32-bit APIC IDs. See KVM_CAP_X2APIC_API in their
				4427	respective sections.
				4428
				4429	KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK must be enabled for x2APIC to work
				4430	in logical mode or with more than 255 VCPUs. Otherwise, KVM treats 0xff
				4431	as a broadcast even in x2APIC mode in order to support physical x2APIC
				4432	without interrupt remapping. This is undesirable in logical mode,
				4433	where 0xff represents CPUs 0-7 in cluster 0.
				4434
				4435	7.8 KVM_CAP_S390_USER_INSTR0
				4436
				4437	Architectures: s390
				4438	Parameters: none
				4439
				4440	With this capability enabled, all illegal instructions 0x0000 (2 bytes) will
				4441	be intercepted and forwarded to user space. User space can use this
				4442	mechanism e.g. to realize 2-byte software breakpoints. The kernel will
				4443	not inject an operating exception for these instructions, user space has
				4444	to take care of that.
				4445
				4446	This capability can be enabled dynamically even if VCPUs were already
				4447	created and are running.
				4448
				4449	7.9 KVM_CAP_S390_GS
				4450
				4451	Architectures: s390
				4452	Parameters: none
				4453	Returns: 0 on success; -EINVAL if the machine does not support
				4454	guarded storage; -EBUSY if a VCPU has already been created.
				4455
				4456	Allows use of guarded storage for the KVM guest.
				4457
				4458	7.10 KVM_CAP_S390_AIS
				4459
				4460	Architectures: s390
				4461	Parameters: none
				4462
				4463	Allow use of adapter-interruption suppression.
				4464	Returns: 0 on success; -EBUSY if a VCPU has already been created.
				4465
				4466	7.11 KVM_CAP_PPC_SMT
				4467
				4468	Architectures: ppc
				4469	Parameters: vsmt_mode, flags
				4470
				4471	Enabling this capability on a VM provides userspace with a way to set
				4472	the desired virtual SMT mode (i.e. the number of virtual CPUs per
				4473	virtual core). The virtual SMT mode, vsmt_mode, must be a power of 2
				4474	between 1 and 8. On POWER8, vsmt_mode must also be no greater than
				4475	the number of threads per subcore for the host. Currently flags must
				4476	be 0. A successful call to enable this capability will result in
				4477	vsmt_mode being returned when the KVM_CAP_PPC_SMT capability is
				4478	subsequently queried for the VM. This capability is only supported by
				4479	HV KVM, and can only be set before any VCPUs have been created.
				4480	The KVM_CAP_PPC_SMT_POSSIBLE capability indicates which virtual SMT
				4481	modes are available.
				4482
				4483	7.12 KVM_CAP_PPC_FWNMI
				4484
				4485	Architectures: ppc
				4486	Parameters: none
				4487
				4488	With this capability a machine check exception in the guest address
				4489	space will cause KVM to exit the guest with NMI exit reason. This
				4490	enables QEMU to build error log and branch to guest kernel registered
				4491	machine check handling routine. Without this capability KVM will
				4492	branch to guests' 0x200 interrupt vector.
				4493
				4494	7.13 KVM_CAP_X86_DISABLE_EXITS
				4495
				4496	Architectures: x86
				4497	Parameters: args[0] defines which exits are disabled
				4498	Returns: 0 on success, -EINVAL when args[0] contains invalid exits
				4499
				4500	Valid bits in args[0] are
				4501
				4502	#define KVM_X86_DISABLE_EXITS_MWAIT (1 << 0)
				4503	#define KVM_X86_DISABLE_EXITS_HLT (1 << 1)
				4504
				4505	Enabling this capability on a VM provides userspace with a way to no
				4506	longer intercept some instructions for improved latency in some
				4507	workloads, and is suggested when vCPUs are associated to dedicated
				4508	physical CPUs. More bits can be added in the future; userspace can
				4509	just pass the KVM_CHECK_EXTENSION result to KVM_ENABLE_CAP to disable
				4510	all such vmexits.
				4511
				4512	Do not enable KVM_FEATURE_PV_UNHALT if you disable HLT exits.
				4513
				4514	7.14 KVM_CAP_S390_HPAGE_1M
				4515
				4516	Architectures: s390
				4517	Parameters: none
				4518	Returns: 0 on success, -EINVAL if hpage module parameter was not set
				4519	or cmma is enabled, or the VM has the KVM_VM_S390_UCONTROL
				4520	flag set
				4521
				4522	With this capability the KVM support for memory backing with 1m pages
				4523	through hugetlbfs can be enabled for a VM. After the capability is
				4524	enabled, cmma can't be enabled anymore and pfmfi and the storage key
				4525	interpretation are disabled. If cmma has already been enabled or the
				4526	hpage module parameter is not set to 1, -EINVAL is returned.
				4527
				4528	While it is generally possible to create a huge page backed VM without
				4529	this capability, the VM will not be able to run.
				4530
				4531	7.14 KVM_CAP_MSR_PLATFORM_INFO
				4532
				4533	Architectures: x86
				4534	Parameters: args[0] whether feature should be enabled or not
				4535
				4536	With this capability, a guest may read the MSR_PLATFORM_INFO MSR. Otherwise,
				4537	a #GP would be raised when the guest tries to access. Currently, this
				4538	capability does not enable write permissions of this MSR for the guest.
				4539
				4540	8. Other capabilities.
				4541	----------------------
				4542
				4543	This section lists capabilities that give information about other
				4544	features of the KVM implementation.
				4545
				4546	8.1 KVM_CAP_PPC_HWRNG
				4547
				4548	Architectures: ppc
				4549
				4550	This capability, if KVM_CHECK_EXTENSION indicates that it is
				4551	available, means that that the kernel has an implementation of the
				4552	H_RANDOM hypercall backed by a hardware random-number generator.
				4553	If present, the kernel H_RANDOM handler can be enabled for guest use
				4554	with the KVM_CAP_PPC_ENABLE_HCALL capability.
				4555
				4556	8.2 KVM_CAP_HYPERV_SYNIC
				4557
				4558	Architectures: x86
				4559	This capability, if KVM_CHECK_EXTENSION indicates that it is
				4560	available, means that that the kernel has an implementation of the
				4561	Hyper-V Synthetic interrupt controller(SynIC). Hyper-V SynIC is
				4562	used to support Windows Hyper-V based guest paravirt drivers(VMBus).
				4563
				4564	In order to use SynIC, it has to be activated by setting this
				4565	capability via KVM_ENABLE_CAP ioctl on the vcpu fd. Note that this
				4566	will disable the use of APIC hardware virtualization even if supported
				4567	by the CPU, as it's incompatible with SynIC auto-EOI behavior.
				4568
				4569	8.3 KVM_CAP_PPC_RADIX_MMU
				4570
				4571	Architectures: ppc
				4572
				4573	This capability, if KVM_CHECK_EXTENSION indicates that it is
				4574	available, means that that the kernel can support guests using the
				4575	radix MMU defined in Power ISA V3.00 (as implemented in the POWER9
				4576	processor).
				4577
				4578	8.4 KVM_CAP_PPC_HASH_MMU_V3
				4579
				4580	Architectures: ppc
				4581
				4582	This capability, if KVM_CHECK_EXTENSION indicates that it is
				4583	available, means that that the kernel can support guests using the
				4584	hashed page table MMU defined in Power ISA V3.00 (as implemented in
				4585	the POWER9 processor), including in-memory segment tables.
				4586
				4587	8.5 KVM_CAP_MIPS_VZ
				4588
				4589	Architectures: mips
				4590
				4591	This capability, if KVM_CHECK_EXTENSION on the main kvm handle indicates that
				4592	it is available, means that full hardware assisted virtualization capabilities
				4593	of the hardware are available for use through KVM. An appropriate
				4594	KVM_VM_MIPS_* type must be passed to KVM_CREATE_VM to create a VM which
				4595	utilises it.
				4596
				4597	If KVM_CHECK_EXTENSION on a kvm VM handle indicates that this capability is
				4598	available, it means that the VM is using full hardware assisted virtualization
				4599	capabilities of the hardware. This is useful to check after creating a VM with
				4600	KVM_VM_MIPS_DEFAULT.
				4601
				4602	The value returned by KVM_CHECK_EXTENSION should be compared against known
				4603	values (see below). All other values are reserved. This is to allow for the
				4604	possibility of other hardware assisted virtualization implementations which
				4605	may be incompatible with the MIPS VZ ASE.
				4606
				4607	0: The trap & emulate implementation is in use to run guest code in user
				4608	mode. Guest virtual memory segments are rearranged to fit the guest in the
				4609	user mode address space.
				4610
				4611	1: The MIPS VZ ASE is in use, providing full hardware assisted
				4612	virtualization, including standard guest virtual memory segments.
				4613
				4614	8.6 KVM_CAP_MIPS_TE
				4615
				4616	Architectures: mips
				4617
				4618	This capability, if KVM_CHECK_EXTENSION on the main kvm handle indicates that
				4619	it is available, means that the trap & emulate implementation is available to
				4620	run guest code in user mode, even if KVM_CAP_MIPS_VZ indicates that hardware
				4621	assisted virtualisation is also available. KVM_VM_MIPS_TE (0) must be passed
				4622	to KVM_CREATE_VM to create a VM which utilises it.
				4623
				4624	If KVM_CHECK_EXTENSION on a kvm VM handle indicates that this capability is
				4625	available, it means that the VM is using trap & emulate.
				4626
				4627	8.7 KVM_CAP_MIPS_64BIT
				4628
				4629	Architectures: mips
				4630
				4631	This capability indicates the supported architecture type of the guest, i.e. the
				4632	supported register and address width.
				4633
				4634	The values returned when this capability is checked by KVM_CHECK_EXTENSION on a
				4635	kvm VM handle correspond roughly to the CP0_Config.AT register field, and should
				4636	be checked specifically against known values (see below). All other values are
				4637	reserved.
				4638
				4639	0: MIPS32 or microMIPS32.
				4640	Both registers and addresses are 32-bits wide.
				4641	It will only be possible to run 32-bit guest code.
				4642
				4643	1: MIPS64 or microMIPS64 with access only to 32-bit compatibility segments.
				4644	Registers are 64-bits wide, but addresses are 32-bits wide.
				4645	64-bit guest code may run but cannot access MIPS64 memory segments.
				4646	It will also be possible to run 32-bit guest code.
				4647
				4648	2: MIPS64 or microMIPS64 with access to all address segments.
				4649	Both registers and addresses are 64-bits wide.
				4650	It will be possible to run 64-bit or 32-bit guest code.
				4651
				4652	8.9 KVM_CAP_ARM_USER_IRQ
				4653
				4654	Architectures: arm, arm64
				4655	This capability, if KVM_CHECK_EXTENSION indicates that it is available, means
				4656	that if userspace creates a VM without an in-kernel interrupt controller, it
				4657	will be notified of changes to the output level of in-kernel emulated devices,
				4658	which can generate virtual interrupts, presented to the VM.
				4659	For such VMs, on every return to userspace, the kernel
				4660	updates the vcpu's run->s.regs.device_irq_level field to represent the actual
				4661	output level of the device.
				4662
				4663	Whenever kvm detects a change in the device output level, kvm guarantees at
				4664	least one return to userspace before running the VM. This exit could either
				4665	be a KVM_EXIT_INTR or any other exit event, like KVM_EXIT_MMIO. This way,
				4666	userspace can always sample the device output level and re-compute the state of
				4667	the userspace interrupt controller. Userspace should always check the state
				4668	of run->s.regs.device_irq_level on every kvm exit.
				4669	The value in run->s.regs.device_irq_level can represent both level and edge
				4670	triggered interrupt signals, depending on the device. Edge triggered interrupt
				4671	signals will exit to userspace with the bit in run->s.regs.device_irq_level
				4672	set exactly once per edge signal.
				4673
				4674	The field run->s.regs.device_irq_level is available independent of
				4675	run->kvm_valid_regs or run->kvm_dirty_regs bits.
				4676
				4677	If KVM_CAP_ARM_USER_IRQ is supported, the KVM_CHECK_EXTENSION ioctl returns a
				4678	number larger than 0 indicating the version of this capability is implemented
				4679	and thereby which bits in in run->s.regs.device_irq_level can signal values.
				4680
				4681	Currently the following bits are defined for the device_irq_level bitmap:
				4682
				4683	KVM_CAP_ARM_USER_IRQ >= 1:
				4684
				4685	KVM_ARM_DEV_EL1_VTIMER - EL1 virtual timer
				4686	KVM_ARM_DEV_EL1_PTIMER - EL1 physical timer
				4687	KVM_ARM_DEV_PMU - ARM PMU overflow interrupt signal
				4688
				4689	Future versions of kvm may implement additional events. These will get
				4690	indicated by returning a higher number from KVM_CHECK_EXTENSION and will be
				4691	listed above.
				4692
				4693	8.10 KVM_CAP_PPC_SMT_POSSIBLE
				4694
				4695	Architectures: ppc
				4696
				4697	Querying this capability returns a bitmap indicating the possible
				4698	virtual SMT modes that can be set using KVM_CAP_PPC_SMT. If bit N
				4699	(counting from the right) is set, then a virtual SMT mode of 2^N is
				4700	available.
				4701
				4702	8.11 KVM_CAP_HYPERV_SYNIC2
				4703
				4704	Architectures: x86
				4705
				4706	This capability enables a newer version of Hyper-V Synthetic interrupt
				4707	controller (SynIC). The only difference with KVM_CAP_HYPERV_SYNIC is that KVM
				4708	doesn't clear SynIC message and event flags pages when they are enabled by
				4709	writing to the respective MSRs.
				4710
				4711	8.12 KVM_CAP_HYPERV_VP_INDEX
				4712
				4713	Architectures: x86
				4714
				4715	This capability indicates that userspace can load HV_X64_MSR_VP_INDEX msr. Its
				4716	value is used to denote the target vcpu for a SynIC interrupt. For
				4717	compatibilty, KVM initializes this msr to KVM's internal vcpu index. When this
				4718	capability is absent, userspace can still query this msr's value.
				4719
				4720	8.13 KVM_CAP_S390_AIS_MIGRATION
				4721
				4722	Architectures: s390
				4723	Parameters: none
				4724
				4725	This capability indicates if the flic device will be able to get/set the
				4726	AIS states for migration via the KVM_DEV_FLIC_AISM_ALL attribute and allows
				4727	to discover this without having to create a flic device.
				4728
				4729	8.14 KVM_CAP_S390_PSW
				4730
				4731	Architectures: s390
				4732
				4733	This capability indicates that the PSW is exposed via the kvm_run structure.
				4734
				4735	8.15 KVM_CAP_S390_GMAP
				4736
				4737	Architectures: s390
				4738
				4739	This capability indicates that the user space memory used as guest mapping can
				4740	be anywhere in the user memory address space, as long as the memory slots are
				4741	aligned and sized to a segment (1MB) boundary.
				4742
				4743	8.16 KVM_CAP_S390_COW
				4744
				4745	Architectures: s390
				4746
				4747	This capability indicates that the user space memory used as guest mapping can
				4748	use copy-on-write semantics as well as dirty pages tracking via read-only page
				4749	tables.
				4750
				4751	8.17 KVM_CAP_S390_BPB
				4752
				4753	Architectures: s390
				4754
				4755	This capability indicates that kvm will implement the interfaces to handle
				4756	reset, migration and nested KVM for branch prediction blocking. The stfle
				4757	facility 82 should not be provided to the guest without this capability.
				4758
				4759	8.18 KVM_CAP_HYPERV_TLBFLUSH
				4760
				4761	Architectures: x86
				4762
				4763	This capability indicates that KVM supports paravirtualized Hyper-V TLB Flush
				4764	hypercalls:
				4765	HvFlushVirtualAddressSpace, HvFlushVirtualAddressSpaceEx,
				4766	HvFlushVirtualAddressList, HvFlushVirtualAddressListEx.
				4767
				4768	8.19 KVM_CAP_ARM_INJECT_SERROR_ESR
				4769
				4770	Architectures: arm, arm64
				4771
				4772	This capability indicates that userspace can specify (via the
				4773	KVM_SET_VCPU_EVENTS ioctl) the syndrome value reported to the guest when it
				4774	takes a virtual SError interrupt exception.
				4775	If KVM advertises this capability, userspace can only specify the ISS field for
				4776	the ESR syndrome. Other parts of the ESR, such as the EC are generated by the
				4777	CPU when the exception is taken. If this virtual SError is taken to EL1 using
				4778	AArch64, this value will be reported in the ISS field of ESR_ELx.
				4779
				4780	See KVM_CAP_VCPU_EVENTS for more details.