Blame - src/kernel/linux/v4.14/Documentation/sysctl/net.txt - T103

blob: e12b39f40a6beaf6925be9052802b160292100ab [file] [log] [blame]

rjw	1f88458	2022-01-06 17:20:42 +0800	[diff] [blame^]	1	Documentation for /proc/sys/net/*
				2	(c) 1999 Terrehon Bowden <terrehon@pacbell.net>
				3	Bodo Bauer <bb@ricochet.net>
				4	(c) 2000 Jorge Nerin <comandante@zaralinux.com>
				5	(c) 2009 Shen Feng <shen@cn.fujitsu.com>
				6
				7	For general info and legal blurb, please look in README.
				8
				9	==============================================================
				10
				11	This file contains the documentation for the sysctl files in
				12	/proc/sys/net
				13
				14	The interface to the networking parts of the kernel is located in
				15	/proc/sys/net. The following table shows all possible subdirectories. You may
				16	see only some of them, depending on your kernel's configuration.
				17
				18
				19	Table : Subdirectories in /proc/sys/net
				20	..............................................................................
				21	Directory Content Directory Content
				22	core General parameter appletalk Appletalk protocol
				23	unix Unix domain sockets netrom NET/ROM
				24	802 E802 protocol ax25 AX25
				25	ethernet Ethernet protocol rose X.25 PLP layer
				26	ipv4 IP version 4 x25 X.25 protocol
				27	ipx IPX token-ring IBM token ring
				28	bridge Bridging decnet DEC net
				29	ipv6 IP version 6 tipc TIPC
				30	..............................................................................
				31
				32	1. /proc/sys/net/core - Network core options
				33	-------------------------------------------------------
				34
				35	bpf_jit_enable
				36	--------------
				37
				38	This enables the BPF Just in Time (JIT) compiler. BPF is a flexible
				39	and efficient infrastructure allowing to execute bytecode at various
				40	hook points. It is used in a number of Linux kernel subsystems such
				41	as networking (e.g. XDP, tc), tracing (e.g. kprobes, uprobes, tracepoints)
				42	and security (e.g. seccomp). LLVM has a BPF back end that can compile
				43	restricted C into a sequence of BPF instructions. After program load
				44	through bpf(2) and passing a verifier in the kernel, a JIT will then
				45	translate these BPF proglets into native CPU instructions. There are
				46	two flavors of JITs, the newer eBPF JIT currently supported on:
				47	- x86_64
				48	- arm64
				49	- arm32
				50	- ppc64
				51	- sparc64
				52	- mips64
				53	- s390x
				54
				55	And the older cBPF JIT supported on the following archs:
				56	- mips
				57	- ppc
				58	- sparc
				59
				60	eBPF JITs are a superset of cBPF JITs, meaning the kernel will
				61	migrate cBPF instructions into eBPF instructions and then JIT
				62	compile them transparently. Older cBPF JITs can only translate
				63	tcpdump filters, seccomp rules, etc, but not mentioned eBPF
				64	programs loaded through bpf(2).
				65
				66	Values :
				67	0 - disable the JIT (default value)
				68	1 - enable the JIT
				69	2 - enable the JIT and ask the compiler to emit traces on kernel log.
				70
				71	bpf_jit_harden
				72	--------------
				73
				74	This enables hardening for the BPF JIT compiler. Supported are eBPF
				75	JIT backends. Enabling hardening trades off performance, but can
				76	mitigate JIT spraying.
				77	Values :
				78	0 - disable JIT hardening (default value)
				79	1 - enable JIT hardening for unprivileged users only
				80	2 - enable JIT hardening for all users
				81
				82	bpf_jit_kallsyms
				83	----------------
				84
				85	When BPF JIT compiler is enabled, then compiled images are unknown
				86	addresses to the kernel, meaning they neither show up in traces nor
				87	in /proc/kallsyms. This enables export of these addresses, which can
				88	be used for debugging/tracing. If bpf_jit_harden is enabled, this
				89	feature is disabled.
				90	Values :
				91	0 - disable JIT kallsyms export (default value)
				92	1 - enable JIT kallsyms export for privileged users only
				93
				94	bpf_jit_limit
				95	-------------
				96
				97	This enforces a global limit for memory allocations to the BPF JIT
				98	compiler in order to reject unprivileged JIT requests once it has
				99	been surpassed. bpf_jit_limit contains the value of the global limit
				100	in bytes.
				101
				102	dev_weight
				103	--------------
				104
				105	The maximum number of packets that kernel can handle on a NAPI interrupt,
				106	it's a Per-CPU variable.
				107	Default: 64
				108
				109	dev_weight_rx_bias
				110	--------------
				111
				112	RPS (e.g. RFS, aRFS) processing is competing with the registered NAPI poll function
				113	of the driver for the per softirq cycle netdev_budget. This parameter influences
				114	the proportion of the configured netdev_budget that is spent on RPS based packet
				115	processing during RX softirq cycles. It is further meant for making current
				116	dev_weight adaptable for asymmetric CPU needs on RX/TX side of the network stack.
				117	(see dev_weight_tx_bias) It is effective on a per CPU basis. Determination is based
				118	on dev_weight and is calculated multiplicative (dev_weight * dev_weight_rx_bias).
				119	Default: 1
				120
				121	dev_weight_tx_bias
				122	--------------
				123
				124	Scales the maximum number of packets that can be processed during a TX softirq cycle.
				125	Effective on a per CPU basis. Allows scaling of current dev_weight for asymmetric
				126	net stack processing needs. Be careful to avoid making TX softirq processing a CPU hog.
				127	Calculation is based on dev_weight (dev_weight * dev_weight_tx_bias).
				128	Default: 1
				129
				130	default_qdisc
				131	--------------
				132
				133	The default queuing discipline to use for network devices. This allows
				134	overriding the default of pfifo_fast with an alternative. Since the default
				135	queuing discipline is created without additional parameters so is best suited
				136	to queuing disciplines that work well without configuration like stochastic
				137	fair queue (sfq), CoDel (codel) or fair queue CoDel (fq_codel). Don't use
				138	queuing disciplines like Hierarchical Token Bucket or Deficit Round Robin
				139	which require setting up classes and bandwidths. Note that physical multiqueue
				140	interfaces still use mq as root qdisc, which in turn uses this default for its
				141	leaves. Virtual devices (like e.g. lo or veth) ignore this setting and instead
				142	default to noqueue.
				143	Default: pfifo_fast
				144
				145	busy_read
				146	----------------
				147	Low latency busy poll timeout for socket reads. (needs CONFIG_NET_RX_BUSY_POLL)
				148	Approximate time in us to busy loop waiting for packets on the device queue.
				149	This sets the default value of the SO_BUSY_POLL socket option.
				150	Can be set or overridden per socket by setting socket option SO_BUSY_POLL,
				151	which is the preferred method of enabling. If you need to enable the feature
				152	globally via sysctl, a value of 50 is recommended.
				153	Will increase power usage.
				154	Default: 0 (off)
				155
				156	busy_poll
				157	----------------
				158	Low latency busy poll timeout for poll and select. (needs CONFIG_NET_RX_BUSY_POLL)
				159	Approximate time in us to busy loop waiting for events.
				160	Recommended value depends on the number of sockets you poll on.
				161	For several sockets 50, for several hundreds 100.
				162	For more than that you probably want to use epoll.
				163	Note that only sockets with SO_BUSY_POLL set will be busy polled,
				164	so you want to either selectively set SO_BUSY_POLL on those sockets or set
				165	sysctl.net.busy_read globally.
				166	Will increase power usage.
				167	Default: 0 (off)
				168
				169	rmem_default
				170	------------
				171
				172	The default setting of the socket receive buffer in bytes.
				173
				174	rmem_max
				175	--------
				176
				177	The maximum receive socket buffer size in bytes.
				178
				179	tstamp_allow_data
				180	-----------------
				181	Allow processes to receive tx timestamps looped together with the original
				182	packet contents. If disabled, transmit timestamp requests from unprivileged
				183	processes are dropped unless socket option SOF_TIMESTAMPING_OPT_TSONLY is set.
				184	Default: 1 (on)
				185
				186
				187	wmem_default
				188	------------
				189
				190	The default setting (in bytes) of the socket send buffer.
				191
				192	wmem_max
				193	--------
				194
				195	The maximum send socket buffer size in bytes.
				196
				197	message_burst and message_cost
				198	------------------------------
				199
				200	These parameters are used to limit the warning messages written to the kernel
				201	log from the networking code. They enforce a rate limit to make a
				202	denial-of-service attack impossible. A higher message_cost factor, results in
				203	fewer messages that will be written. Message_burst controls when messages will
				204	be dropped. The default settings limit warning messages to one every five
				205	seconds.
				206
				207	warnings
				208	--------
				209
				210	This sysctl is now unused.
				211
				212	This was used to control console messages from the networking stack that
				213	occur because of problems on the network like duplicate address or bad
				214	checksums.
				215
				216	These messages are now emitted at KERN_DEBUG and can generally be enabled
				217	and controlled by the dynamic_debug facility.
				218
				219	netdev_budget
				220	-------------
				221
				222	Maximum number of packets taken from all interfaces in one polling cycle (NAPI
				223	poll). In one polling cycle interfaces which are registered to polling are
				224	probed in a round-robin manner. Also, a polling cycle may not exceed
				225	netdev_budget_usecs microseconds, even if netdev_budget has not been
				226	exhausted.
				227
				228	netdev_budget_usecs
				229	---------------------
				230
				231	Maximum number of microseconds in one NAPI polling cycle. Polling
				232	will exit when either netdev_budget_usecs have elapsed during the
				233	poll cycle or the number of packets processed reaches netdev_budget.
				234
				235	netdev_max_backlog
				236	------------------
				237
				238	Maximum number of packets, queued on the INPUT side, when the interface
				239	receives packets faster than kernel can process them.
				240
				241	netdev_rss_key
				242	--------------
				243
				244	RSS (Receive Side Scaling) enabled drivers use a 40 bytes host key that is
				245	randomly generated.
				246	Some user space might need to gather its content even if drivers do not
				247	provide ethtool -x support yet.
				248
				249	myhost:~# cat /proc/sys/net/core/netdev_rss_key
				250	84:50:f4:00:a8:15:d1:a7:e9:7f:1d:60:35:c7:47:25:42:97:74:ca:56:bb:b6:a1:d8: ... (52 bytes total)
				251
				252	File contains nul bytes if no driver ever called netdev_rss_key_fill() function.
				253	Note:
				254	/proc/sys/net/core/netdev_rss_key contains 52 bytes of key,
				255	but most drivers only use 40 bytes of it.
				256
				257	myhost:~# ethtool -x eth0
				258	RX flow hash indirection table for eth0 with 8 RX ring(s):
				259	0: 0 1 2 3 4 5 6 7
				260	RSS hash key:
				261	84:50:f4:00:a8:15:d1:a7:e9:7f:1d:60:35:c7:47:25:42:97:74:ca:56:bb:b6:a1:d8:43:e3:c9:0c:fd:17:55:c2:3a:4d:69:ed:f1:42:89
				262
				263	netdev_tstamp_prequeue
				264	----------------------
				265
				266	If set to 0, RX packet timestamps can be sampled after RPS processing, when
				267	the target CPU processes packets. It might give some delay on timestamps, but
				268	permit to distribute the load on several cpus.
				269
				270	If set to 1 (default), timestamps are sampled as soon as possible, before
				271	queueing.
				272
				273	optmem_max
				274	----------
				275
				276	Maximum ancillary buffer size allowed per socket. Ancillary data is a sequence
				277	of struct cmsghdr structures with appended data.
				278
				279	2. /proc/sys/net/unix - Parameters for Unix domain sockets
				280	-------------------------------------------------------
				281
				282	There is only one file in this directory.
				283	unix_dgram_qlen limits the max number of datagrams queued in Unix domain
				284	socket's buffer. It will not take effect unless PF_UNIX flag is specified.
				285
				286
				287	3. /proc/sys/net/ipv4 - IPV4 settings
				288	-------------------------------------------------------
				289	Please see: Documentation/networking/ip-sysctl.txt and ipvs-sysctl.txt for
				290	descriptions of these entries.
				291
				292
				293	4. Appletalk
				294	-------------------------------------------------------
				295
				296	The /proc/sys/net/appletalk directory holds the Appletalk configuration data
				297	when Appletalk is loaded. The configurable parameters are:
				298
				299	aarp-expiry-time
				300	----------------
				301
				302	The amount of time we keep an ARP entry before expiring it. Used to age out
				303	old hosts.
				304
				305	aarp-resolve-time
				306	-----------------
				307
				308	The amount of time we will spend trying to resolve an Appletalk address.
				309
				310	aarp-retransmit-limit
				311	---------------------
				312
				313	The number of times we will retransmit a query before giving up.
				314
				315	aarp-tick-time
				316	--------------
				317
				318	Controls the rate at which expires are checked.
				319
				320	The directory /proc/net/appletalk holds the list of active Appletalk sockets
				321	on a machine.
				322
				323	The fields indicate the DDP type, the local address (in network:node format)
				324	the remote address, the size of the transmit pending queue, the size of the
				325	received queue (bytes waiting for applications to read) the state and the uid
				326	owning the socket.
				327
				328	/proc/net/atalk_iface lists all the interfaces configured for appletalk.It
				329	shows the name of the interface, its Appletalk address, the network range on
				330	that address (or network number for phase 1 networks), and the status of the
				331	interface.
				332
				333	/proc/net/atalk_route lists each known network route. It lists the target
				334	(network) that the route leads to, the router (may be directly connected), the
				335	route flags, and the device the route is using.
				336
				337
				338	5. IPX
				339	-------------------------------------------------------
				340
				341	The IPX protocol has no tunable values in proc/sys/net.
				342
				343	The IPX protocol does, however, provide proc/net/ipx. This lists each IPX
				344	socket giving the local and remote addresses in Novell format (that is
				345	network:node:port). In accordance with the strange Novell tradition,
				346	everything but the port is in hex. Not_Connected is displayed for sockets that
				347	are not tied to a specific remote address. The Tx and Rx queue sizes indicate
				348	the number of bytes pending for transmission and reception. The state
				349	indicates the state the socket is in and the uid is the owning uid of the
				350	socket.
				351
				352	The /proc/net/ipx_interface file lists all IPX interfaces. For each interface
				353	it gives the network number, the node number, and indicates if the network is
				354	the primary network. It also indicates which device it is bound to (or
				355	Internal for internal networks) and the Frame Type if appropriate. Linux
				356	supports 802.3, 802.2, 802.2 SNAP and DIX (Blue Book) ethernet framing for
				357	IPX.
				358
				359	The /proc/net/ipx_route table holds a list of IPX routes. For each route it
				360	gives the destination network, the router node (or Directly) and the network
				361	address of the router (or Connected) for internal networks.
				362
				363	6. TIPC
				364	-------------------------------------------------------
				365
				366	tipc_rmem
				367	----------
				368
				369	The TIPC protocol now has a tunable for the receive memory, similar to the
				370	tcp_rmem - i.e. a vector of 3 INTEGERs: (min, default, max)
				371
				372	# cat /proc/sys/net/tipc/tipc_rmem
				373	4252725 34021800 68043600
				374	#
				375
				376	The max value is set to CONN_OVERLOAD_LIMIT, and the default and min values
				377	are scaled (shifted) versions of that same value. Note that the min value
				378	is not at this point in time used in any meaningful way, but the triplet is
				379	preserved in order to be consistent with things like tcp_rmem.
				380
				381	named_timeout
				382	--------------
				383
				384	TIPC name table updates are distributed asynchronously in a cluster, without
				385	any form of transaction handling. This means that different race scenarios are
				386	possible. One such is that a name withdrawal sent out by one node and received
				387	by another node may arrive after a second, overlapping name publication already
				388	has been accepted from a third node, although the conflicting updates
				389	originally may have been issued in the correct sequential order.
				390	If named_timeout is nonzero, failed topology updates will be placed on a defer
				391	queue until another event arrives that clears the error, or until the timeout
				392	expires. Value is in milliseconds.