Blame - ap/app/iproute2/iproute2-3.4.0/man/man7/tc-hfsc.7 - T106_DC

blob: eff8a0f0f6675780abe10e59415fbdca1f75ae85 [file] [log] [blame]

lh	9ed821d	2023-04-07 01:36:19 -0700	[diff] [blame^]	1	.TH "TC\-HFSC" 7 "31 October 2011" iproute2 Linux
				2	.SH "NAME"
				3	tc-hfcs \- Hierarchical Fair Service Curve
				4	.
				5	.SH "HISTORY & INTRODUCTION"
				6	.
				7	HFSC \- \fBHierarchical Fair Service Curve\fR was first presented at
				8	SIGCOMM'97. Developed as a part of ALTQ (ALTernative Queuing) on NetBSD, found
				9	its way quickly to other BSD systems, and then a few years ago became part of
				10	the linux kernel. Still, it's not the most popular scheduling algorithm \-
				11	especially if compared to HTB \- and it's not well documented from enduser's
				12	perspective. This introduction aims to explain how HFSC works without
				13	going to deep into math side of things (although some if it will be
				14	inevitable).
				15
				16	In short HFSC aims to:
				17	.
				18	.RS 4
				19	.IP \fB1)\fR 4
				20	guarantee precise bandwidth and delay allocation for all leaf classes (realtime
				21	criterion)
				22	.IP \fB2)\fR
				23	allocate excess bandwidth fairly as specified by class hierarchy (linkshare &
				24	upperlimit criterion)
				25	.IP \fB3)\fR
				26	minimize any discrepancy between the service curve and the actual amount of
				27	service provided during linksharing
				28	.RE
				29	.PP
				30	.
				31	The main "selling" point of HFSC is feature \fB(1)\fR, which is achieved by
				32	using nonlinear service curves (more about what it actually is later). This is
				33	particularly useful in VoIP or games, where not only guarantee of consistent
				34	bandwidth is important, but initial delay of a data stream as well. Note that
				35	it matters only for leaf classes (where the actual queues are) \- thus class
				36	hierarchy is ignored in realtime case.
				37
				38	Feature \fB(2)\fR is well, obvious \- any algorithm featuring class hierarchy
				39	(such as HTB or CBQ) strives to achieve that. HFSC does that well, although
				40	you might end with unusual situations, if you define service curves carelessly
				41	\- see section CORNER CASES for examples.
				42
				43	Feature \fB(3)\fR is mentioned due to the nature of the problem. There may be
				44	situations where it's either not possible to guarantee service of all curves at
				45	the same time, and/or it's impossible to do so fairly. Both will be explained
				46	later. Note that this is mainly related to interior (aka aggregate) classes, as
				47	the leafs are already handled by \fB(1)\fR. Still \- it's perfectly possible to
				48	create a leaf class w/o realtime service, and in such case \- the caveats will
				49	naturally extend to leaf classes as well.
				50
				51	.SH ABBREVIATIONS
				52	For the remaining part of the document, we'll use following shortcuts:
				53	.nf
				54	.RS 4
				55
				56	RT \- realtime
				57	LS \- linkshare
				58	UL \- upperlimit
				59	SC \- service curve
				60	.fi
				61	.
				62	.SH "BASICS OF HFSC"
				63	.
				64	To understand how HFSC works, we must first introduce a service curve.
				65	Overall, it's a nondecreasing function of some time unit, returning amount of
				66	service (allowed or allocated amount of bandwidth) by some specific point in
				67	time. The purpose of it should be subconsciously obvious \- if a class was
				68	allowed to transfer not less than the amount specified by its service curve \-
				69	then service curve is not violated.
				70
				71	Still \- we need more elaborate criterion than just the above (although in
				72	most generic case it can be reduced to it). The criterion has to take two
				73	things into account:
				74	.
				75	.RS 4
				76	.IP \(bu 4
				77	idling periods
				78	.IP \(bu
				79	ability to "look back", so if during current active period service curve is violated, maybe it
				80	isn't if we count excess bandwidth received during earlier active period(s)
				81	.RE
				82	.PP
				83	Let's define the criterion as follows:
				84	.RS 4
				85	.nf
				86	.IP "\fB(1)\fR" 4
				87	For each t1, there must exist t0 in set B, so S(t1\-t0)\~<=\~w(t0,t1)
				88	.fi
				89	.RE
				90	.
				91	.PP
				92	Here 'w' denotes the amount of service received during some time period between t0
				93	and t1. B is a set of all times, where a session becomes active after idling
				94	period (further denoted as 'becoming backlogged'). For a clearer picture,
				95	imagine two situations:
				96	.
				97	.RS 4
				98	.IP \fBa)\fR 4
				99	our session was active during two periods, with a small time gap between them
				100	.IP \fBb)\fR
				101	as in (a), but with a larger gap
				102	.RE
				103	.
				104	.PP
				105	Consider \fB(a)\fR \- if the service received during both periods meets
				106	\fB(1)\fR, then all is good. But what if it doesn't do so during the 2nd
				107	period ? If the amount of service received during the 1st period is bigger
				108	than the service curve, then it might compensate for smaller service during
				109	the 2nd period \fIand\fR the gap \- if the gap is small enough.
				110
				111	If the gap is larger \fB(b)\fR \- then it's less likely to happen (unless the
				112	excess bandwidth allocated during the 1st part was really large). Still, the
				113	larger the gap \- the less interesting is what happened in the past (e.g. 10
				114	minutes ago) \- what matters is the current traffic that just started.
				115
				116	From HFSC's perspective, more interesting is answering the following question:
				117	when should we start transferring packets, so a service curve of a class is not
				118	violated. Or rephrasing it: How much X() amount of service should a session
				119	receive by time t, so the service curve is not violated. Function X() defined
				120	as below is the basic building block of HFSC, used in: eligible, deadline,
				121	virtual\-time and fit\-time curves. Of course, X() is based on equation
				122	\fB(1)\fR and is defined recursively:
				123
				124	.RS 4
				125	.IP \(bu 4
				126	At the 1st backlogged period beginning function X is initialized to generic
				127	service curve assigned to a class
				128	.IP \(bu
				129	At any subsequent backlogged period, X() is:
				130	.nf
				131	\fBmin(X() from previous period ; w(t0)+S(t\-t0) for t>=t0),\fR
				132	.fi
				133	\&... where t0 denotes the beginning of the current backlogged period.
				134	.RE
				135	.
				136	.PP
				137	HFSC uses either linear, or two\-piece linear service curves. In case of
				138	linear or two\-piece linear convex functions (first slope < second slope),
				139	min() in X's definition reduces to the 2nd argument. But in case of two\-piece
				140	concave functions, the 1st argument might quickly become lesser for some
				141	t>=t0. Note, that for some backlogged period, X() is defined only from that
				142	period's beginning. We also define X^(\-1)(w) as smallest t>=t0, for which
				143	X(t)\~=\~w. We have to define it this way, as X() is usually not an injection.
				144
				145	The above generic X() can be one of the following:
				146	.
				147	.RS 4
				148	.IP "E()" 4
				149	In realtime criterion, selects packets eligible for sending. If none are
				150	eligible, HFSC will use linkshare criterion. Eligible time \&'et' is calculated
				151	with reference to packets' heads ( et\~=\~E^(\-1)(w) ). It's based on RT
				152	service curve, \fIbut in case of a convex curve, uses its 2nd slope only.\fR
				153	.IP "D()"
				154	In realtime criterion, selects the most suitable packet from the ones chosen
				155	by E(). Deadline time \&'dt' corresponds to packets' tails
				156	(dt\~=\~D^(\-1)(w+l), where \&'l' is packet's length). Based on RT service
				157	curve.
				158	.IP "V()"
				159	In linkshare criterion, arbitrates which packet to send next. Note that V() is
				160	function of a virtual time \- see \fBLINKSHARE CRITERION\fR section for
				161	details. Virtual time \&'vt' corresponds to packets' heads
				162	(vt\~=\~V^(\-1)(w)). Based on LS service curve.
				163	.IP "F()"
				164	An extension to linkshare criterion, used to limit at which speed linkshare
				165	criterion is allowed to dequeue. Fit\-time 'ft' corresponds to packets' heads
				166	as well (ft\~=\~F^(\-1)(w)). Based on UL service curve.
				167	.RE
				168
				169	Be sure to make clean distinction between session's RT, LS and UL service
				170	curves and the above "utility" functions.
				171	.
				172	.SH "REALTIME CRITERION"
				173	.
				174	RT criterion \fIignores class hierarchy\fR and guarantees precise bandwidth and
				175	delay allocation. We say that packet is eligible for sending, when current real
				176	time is bigger than eligible time. From all packets eligible, the one most
				177	suited for sending, is the one with the smallest deadline time. Sounds simply,
				178	but consider following example:
				179
				180	Interface 10mbit, two classes, both with two\-piece linear service curves:
				181	.RS 4
				182	.IP \(bu 4
				183	1st class \- 2mbit for 100ms, then 7mbit (convex \- 1st slope < 2nd slope)
				184	.IP \(bu
				185	2nd class \- 7mbit for 100ms, then 2mbit (concave \- 1st slope > 2nd slope)
				186	.RE
				187	.PP
				188	Assume for a moment, that we only use D() for both finding eligible packets,
				189	and choosing the most fitting one, thus eligible time would be computed as
				190	D^(\-1)(w) and deadline time would be computed as D^(\-1)(w+l). If the 2nd
				191	class starts sending packets 1 second after the 1st class, it's of course
				192	impossible to guarantee 14mbit, as the interface capability is only 10mbit.
				193	The only workaround in this scenario is to allow the 1st class to send the
				194	packets earlier that would normally be allowed. That's where separate E() comes
				195	to help. Putting all the math aside (see HFSC paper for details), E() for RT
				196	concave service curve is just like D(), but for the RT convex service curve \-
				197	it's constructed using \fIonly\fR RT service curve's 2nd slope (in our example
				198	\- 7mbit).
				199
				200	The effect of such E() \- packets will be sent earlier, and at the same time
				201	D() \fIwill\fR be updated \- so current deadline time calculated from it will
				202	be bigger. Thus, when the 2nd class starts sending packets later, both the 1st
				203	and the 2nd class will be eligible, but the 2nd session's deadline time will be
				204	smaller and its packets will be sent first. When the 1st class becomes idle at
				205	some later point, the 2nd class will be able to "buffer" up again for later
				206	active period of the 1st class.
				207
				208	A short remark \- in a situation, where the total amount of bandwidth
				209	available on the interface is bigger than the allocated total realtime parts
				210	(imagine interface 10 mbit, but 1mbit/2mbit and 2mbit/1mbit classes), the sole
				211	speed of the interface could suffice to guarantee the times.
				212
				213	Important part of RT criterion is that apart from updating its D() and E(),
				214	also V() used by LS criterion is updated. Generally the RT criterion is
				215	secondary to LS one, and used \fIonly\fR if there's a risk of violating precise
				216	realtime requirements. Still, the "participation" in bandwidth distributed by
				217	LS criterion is there, so V() has to be updated along the way. LS criterion can
				218	than properly compensate for non\-ideal fair sharing situation, caused by RT
				219	scheduling. If you use UL service curve its F() will be updated as well (UL
				220	service curve is an extension to LS one \- see \fBUPPERLIMIT CRITERION\fR
				221	section).
				222
				223	Anyway \- careless specification of LS and RT service curves can lead to
				224	potentially undesired situations (see CORNER CASES for examples). This wasn't
				225	the case in HFSC paper where LS and RT service curves couldn't be specified
				226	separately.
				227
				228	.SH "LINKSHARING CRITERION"
				229	.
				230	LS criterion's task is to distribute bandwidth according to specified class
				231	hierarchy. Contrary to RT criterion, there're no comparisons between current
				232	real time and virtual time \- the decision is based solely on direct comparison
				233	of virtual times of all active subclasses \- the one with the smallest vt wins
				234	and gets scheduled. One immediate conclusion from this fact is that absolute
				235	values don't matter \- only ratios between them (so for example, two children
				236	classes with simple linear 1mbit service curves will get the same treatment
				237	from LS criterion's perspective, as if they were 5mbit). The other conclusion
				238	is, that in perfectly fluid system with linear curves, all virtual times across
				239	whole class hierarchy would be equal.
				240
				241	Why is VC defined in term of virtual time (and what is it) ?
				242
				243	Imagine an example: class A with two children \- A1 and A2, both with let's say
				244	10mbit SCs. If A2 is idle, A1 receives all the bandwidth of A (and update its
				245	V() in the process). When A2 becomes active, A1's virtual time is already
				246	\fIfar\fR bigger than A2's one. Considering the type of decision made by LS
				247	criterion, A1 would become idle for a lot of time. We can workaround this
				248	situation by adjusting virtual time of the class becoming active \- we do that
				249	by getting such time "up to date". HFSC uses a mean of the smallest and the
				250	biggest virtual time of currently active children fit for sending. As it's not
				251	real time anymore (excluding trivial case of situation where all classes become
				252	active at the same time, and never become idle), it's called virtual time.
				253
				254	Such approach has its price though. The problem is analogous to what was
				255	presented in previous section and is caused by non\-linearity of service
				256	curves:
				257	.IP 1) 4
				258	either it's impossible to guarantee service curves and satisfy fairness
				259	during certain time periods:
				260
				261	.RS 4
				262	Recall the example from RT section, slightly modified (with 3mbit slopes
				263	instead of 2mbit ones):
				264
				265	.IP \(bu 4
				266	1st class \- 3mbit for 100ms, then 7mbit (convex \- 1st slope < 2nd slope)
				267	.IP \(bu
				268	2nd class \- 7mbit for 100ms, then 3mbit (concave \- 1st slope > 2nd slope)
				269
				270	.PP
				271	They sum up nicely to 10mbit \- interface's capacity. But if we wanted to only
				272	use LS for guarantees and fairness \- it simply won't work. In LS context,
				273	only V() is used for making decision which class to schedule. If the 2nd class
				274	becomes active when the 1st one is in its second slope, the fairness will be
				275	preserved \- ratio will be 1:1 (7mbit:7mbit), but LS itself is of course
				276	unable to guarantee the absolute values themselves \- as it would have to go
				277	beyond of what the interface is capable of.
				278	.RE
				279
				280	.IP 2) 4
				281	and/or it's impossible to guarantee service curves of all classes at the same
				282	time [fairly or not]:
				283
				284	.RS 4
				285
				286	This is similar to the above case, but a bit more subtle. We will consider two
				287	subtrees, arbitrated by their common (root here) parent:
				288
				289	.nf
				290	R (root) -\ 10mbit
				291
				292	A \- 7mbit, then 3mbit
				293	A1 \- 5mbit, then 2mbit
				294	A2 \- 2mbit, then 1mbit
				295
				296	B \- 3mbit, then 7mbit
				297	.fi
				298
				299	R arbitrates between left subtree (A) and right (B). Assume that A2 and B are
				300	constantly backlogged, and at some later point A1 becomes backlogged (when all
				301	other classes are in their 2nd linear part).
				302
				303	What happens now ? B (choice made by R) will \fIalways\fR get 7 mbit as R is
				304	only (obviously) concerned with the ratio between its direct children. Thus A
				305	subtree gets 3mbit, but its children would want (at the point when A1 became
				306	backlogged) 5mbit + 1mbit. That's of course impossible, as they can only get
				307	3mbit due to interface limitation.
				308
				309	In the left subtree \- we have the same situation as previously (fair split
				310	between A1 and A2, but violated guarantees), but in the whole tree \- there's
				311	no fairness (B got 7mbit, but A1 and A2 have to fit together in 3mbit) and
				312	there's no guarantees for all classes (only B got what it wanted). Even if we
				313	violated fairness in the A subtree and set A2's service curve to 0, A1 would
				314	still not get the required bandwidth.
				315	.RE
				316	.
				317	.SH "UPPERLIMIT CRITERION"
				318	.
				319	UL criterion is an extensions to LS one, that permits sending packets only
				320	if current real time is bigger than fit\-time ('ft'). So the modified LS
				321	criterion becomes: choose the smallest virtual time from all active children,
				322	such that fit\-time < current real time also holds. Fit\-time is calculated
				323	from F(), which is based on UL service curve. As you can see, it's role is
				324	kinda similar to E() used in RT criterion. Also, for obvious reasons \- you
				325	can't specify UL service curve without LS one.
				326
				327	Main purpose of UL service curve is to limit HFSC to bandwidth available on the
				328	upstream router (think adsl home modem/router, and linux server as
				329	nat/firewall/etc. with 100mbit+ connection to mentioned modem/router).
				330	Typically, it's used to create a single class directly under root, setting
				331	linear UL service curve to available bandwidth \- and then creating your class
				332	structure from that class downwards. Of course, you're free to add UL service
				333	(linear or not) curve to any class with LS criterion.
				334
				335	Important part about UL service curve is, that whenever at some point in time
				336	a class doesn't qualify for linksharing due to its fit\-time, the next time it
				337	does qualify, it will update its virtual time to the smallest virtual time of
				338	all active children fit for linksharing. This way, one of the main things LS
				339	criterion tries to achieve \- equality of all virtual times across whole
				340	hierarchy \- is preserved (in perfectly fluid system with only linear curves,
				341	all virtual times would be equal).
				342
				343	Without that, 'vt' would lag behind other virtual times, and could cause
				344	problems. Consider interface with capacity 10mbit, and following leaf classes
				345	(just in case you're skipping this text quickly \- this example shows behavior
				346	that \f(BIdoesn't happen\fR):
				347
				348	.nf
				349	A \- ls 5.0mbit
				350	B \- ls 2.5mbit
				351	C \- ls 2.5mbit, ul 2.5mbit
				352	.fi
				353
				354	If B was idle, while A and C were constantly backlogged, they would normally
				355	(as far as LS criterion is concerned) divide bandwidth in 2:1 ratio. But due
				356	to UL service curve in place, C would get at most 2.5mbit, and A would get the
				357	remaining 7.5mbit. The longer the backlogged period, the more virtual times of
				358	A and C would drift apart. If B became backlogged at some later point in time,
				359	its virtual time would be set to (A's\~vt\~+\~C's\~vt)/2, thus blocking A from
				360	sending any traffic, until B's virtual time catches up with A.
				361	.
				362	.SH "SEPARATE LS / RT SCs"
				363	.
				364	Another difference from original HFSC paper, is that RT and LS SCs can be
				365	specified separately. Moreover \- leaf classes are allowed to have only either
				366	RT SC or LS SC. For interior classes, only LS SCs make sense \- Any RT SC will
				367	be ignored.
				368	.
				369	.SH "CORNER CASES"
				370	.
				371	Separate service curves for LS and RT criteria can lead to certain traps,
				372	that come from "fighting" between ideal linksharing and enforced realtime
				373	guarantees. Those situations didn't exist in original HFSC paper, where
				374	specifying separate LS / RT service curves was not discussed.
				375
				376	Consider interface with capacity 10mbit, with following leaf classes:
				377
				378	.nf
				379	A \- ls 5.0mbit, rt 8mbit
				380	B \- ls 2.5mbit
				381	C \- ls 2.5mbit
				382	.fi
				383
				384	Imagine A and C are constantly backlogged. As B is idle, A and C would divide
				385	bandwidth in 2:1 ratio, considering LS service curve (so in theory \- 6.66 and
				386	3.33). Alas RT criterion takes priority, so A will get 8mbit and LS will be
				387	able to compensate class C for only 2 mbit \- this will cause discrepancy
				388	between virtual times of A and C.
				389
				390	Assume this situation lasts for a lot of time with no idle periods, and
				391	suddenly B becomes active. B's virtual time will be updated to
				392	(A's\~vt\~+\~C's\~vt)/2, effectively landing in the middle between A's and C's
				393	virtual time. The effect \- B, having no RT guarantees, will be punished and
				394	will not be allowed to transfer until C's virtual time catches up.
				395
				396	If the interface had higher capacity \- for example 100mbit, this example
				397	would behave perfectly fine though.
				398
				399	Let's look a bit closer at the above example \- it "cleverly" invalidates one
				400	of the basic things LS criterion tries to achieve \- equality of all virtual
				401	times across class hierarchy. Leaf classes without RT service curves are
				402	literally left to their own fate (governed by messed up virtual times).
				403
				404	Also - it doesn't make much sense. Class A will always be guaranteed up to
				405	8mbit, and this is more than any absolute bandwidth that could happen from its
				406	LS criterion (excluding trivial case of only A being active). If the bandwidth
				407	taken by A is smaller than absolute value from LS criterion, the unused part
				408	will be automatically assigned to other active classes (as A has idling periods
				409	in such case). The only "advantage" is, that even in case of low bandwidth on
				410	average, bursts would be handled at the speed defined by RT criterion. Still,
				411	if extra speed is needed (e.g. due to latency), non linear service curves
				412	should be used in such case.
				413
				414	In the other words - LS criterion is meaningless in the above example.
				415
				416	You can quickly "workaround" it by making sure each leaf class has RT service
				417	curve assigned (thus guaranteeing all of them will get some bandwidth), but it
				418	doesn't make it any more valid.
				419
				420	Keep in mind - if you use nonlinear curves and irregularities explained above
				421	happen \fIonly\fR in the first segment, then there's little wrong with
				422	"overusing" RT curve a bit:
				423
				424	.nf
				425	A \- ls 5.0mbit, rt 9mbit/30ms, then 1mbit
				426	B \- ls 2.5mbit
				427	C \- ls 2.5mbit
				428	.fi
				429
				430	Here, the vt of A will "spike" in the initial period, but then A will never get more
				431	than 1mbit, until B & C catch up. Then everything will be back to normal.
				432	.
				433	.SH "LINUX AND TIMER RESOLUTION"
				434	.
				435	In certain situations, the scheduler can throttle itself and setup so
				436	called watchdog to wakeup dequeue function at some time later. In case of HFSC
				437	it happens when for example no packet is eligible for scheduling, and UL
				438	service curve is used to limit the speed at which LS criterion is allowed to
				439	dequeue packets. It's called throttling, and accuracy of it is dependent on
				440	how the kernel is compiled.
				441
				442	There're 3 important options in modern kernels, as far as timers' resolution
				443	goes: \&'tickless system', \&'high resolution timer support' and \&'timer
				444	frequency'.
				445
				446	If you have \&'tickless system' enabled, then the timer interrupt will trigger
				447	as slowly as possible, but each time a scheduler throttles itself (or any
				448	other part of the kernel needs better accuracy), the rate will be increased as
				449	needed / possible. The ceiling is either \&'timer frequency' if \&'high
				450	resolution timer support' is not available or not compiled in, or it's
				451	hardware dependent and can go \fIfar\fR beyond the highest \&'timer frequency'
				452	setting available.
				453
				454	If \&'tickless system' is not enabled, the timer will trigger at a fixed rate
				455	specified by \&'timer frequency' \- regardless if high resolution timers are
				456	or aren't available.
				457
				458	This is important to keep those settings in mind, as in scenario like: no
				459	tickless, no HR timers, frequency set to 100hz \- throttling accuracy would be
				460	at 10ms. It doesn't automatically mean you would be limited to ~0.8mbit/s
				461	(assuming packets at ~1KB) \- as long as your queues are prepared to cover for
				462	timer inaccuracy. Of course, in case of e.g. locally generated udp traffic \-
				463	appropriate socket size is needed as well. Short example to make it more
				464	understandable (assume hardcore anti\-schedule settings \- HZ=100, no HR
				465	timers, no tickless):
				466
				467	.nf
				468	tc qdisc add dev eth0 root handle 1:0 hfsc default 1
				469	tc class add dev eth0 parent 1:0 classid 1:1 hfsc rt m2 10mbit
				470	.fi
				471
				472	Assuming packet of ~1KB size and HZ=100, that averages to ~0.8mbit \- anything
				473	beyond it (e.g. the above example with specified rate over 10x bigger) will
				474	require appropriate queuing and cause bursts every ~10 ms. As you can
				475	imagine, any HFSC's RT guarantees will be seriously invalidated by that.
				476	Aforementioned example is mainly important if you deal with old hardware \- as
				477	it's particularly popular for home server chores. Even then, you can easily
				478	set HZ=1000 and have very accurate scheduling for typical adsl speeds.
				479
				480	Anything modern (apic or even hpet msi based timers + \&'tickless system')
				481	will provide enough accuracy for superb 1gbit scheduling. For example, on one
				482	of basically cheap dual core AMD boards I have with following settings:
				483
				484	.nf
				485	tc qdisc add dev eth0 parent root handle 1:0 hfsc default 1
				486	tc class add dev eth0 paretn 1:0 classid 1:1 hfsc rt m2 300mbit
				487	.fi
				488
				489	And simple:
				490
				491	.nf
				492	nc \-u dst.host.com 54321 </dev/zero
				493	nc \-l \-p 54321 >/dev/null
				494	.fi
				495
				496	\&...will yield following effects over period of ~10 seconds (taken from
				497	/proc/interrupts):
				498
				499	.nf
				500	319: 42124229 0 HPET_MSI\-edge hpet2 (before)
				501	319: 42436214 0 HPET_MSI\-edge hpet2 (after 10s.)
				502	.fi
				503
				504	That's roughly 31000/s. Now compare it with HZ=1000 setting. The obvious
				505	drawback of it is that cpu load can be rather extensive with servicing that
				506	many timer interrupts. Example with 300mbit RT service curve on 1gbit link is
				507	particularly ugly, as it requires a lot of throttling with minuscule delays.
				508
				509	Also note that it's just an example showing capability of current hardware.
				510	The above example (essentially 300mbit TBF emulator) is pointless on internal
				511	interface to begin with \- you will pretty much always want regular LS service
				512	curve there, and in such scenario HFSC simply doesn't throttle at all.
				513
				514	300mbit RT service curve (selected columns from mpstat \-P ALL 1):
				515
				516	.nf
				517	10:56:43 PM CPU %sys %irq %soft %idle
				518	10:56:44 PM all 20.10 6.53 34.67 37.19
				519	10:56:44 PM 0 35.00 0.00 63.00 0.00
				520	10:56:44 PM 1 4.95 12.87 6.93 73.27
				521	.fi
				522
				523	So, in rare case you need those speeds with only RT service curve, or with UL
				524	service curve \- remember about drawbacks.
				525	.
				526	.SH "CAVEAT: RANDOM ONLINE EXAMPLES"
				527	.
				528	For reasons unknown (though well guessed), many examples you can google love to
				529	overuse UL criterion and stuff it in every node possible. This makes no sense
				530	and works against what HFSC tries to do (and does pretty damn well). Use UL
				531	where it makes sense - on the uppermost node to match upstream router's uplink
				532	capacity. Or - in special cases, such as testing (limit certain subtree to some
				533	speed) or customers that must never get more than certain speed. In the last
				534	case you can usually achieve the same by just using RT criterion without LS+UL
				535	on leaf nodes.
				536
				537	As for router case - remember it's good to differentiate between "traffic to
				538	router" (remote console, web config, etc.) and "outgoing traffic", so for
				539	example:
				540
				541	.nf
				542	tc qdisc add dev eth0 root handle 1:0 hfsc default 0x8002
				543	tc class add dev eth0 parent 1:0 classid 1:999 hfsc rt m2 50mbit
				544	tc class add dev eth0 parent 1:0 classid 1:1 hfsc ls m2 2mbit ul m2 2mbit
				545	.fi
				546
				547	\&... so "internet" tree under 1:1 and "router itself" as 1:999
				548	.
				549	.SH "LAYER2 ADAPTATION"
				550	.
				551	Please refer to \fBtc\-stab\fR(8)
				552	.
				553	.SH "SEE ALSO"
				554	.
				555	\fBtc\fR(8), \fBtc\-hfsc\fR(8), \fBtc\-stab\fR(8)
				556
				557	Please direct bugreports and patches to: <net...@vger.kernel.org>
				558	.
				559	.SH "AUTHOR"
				560	.
				561	Manpage created by Michal Soltys (sol...@ziu.info)