博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
iproute2+tc notes
阅读量:6153 次
发布时间:2019-06-21

本文共 35352 字,大约阅读时间需要 117 分钟。

iproute2+tc notes

The iproute2+tc package allows access to the variety of neat new networking features in the 2.2 kernels. Policy routing, NAT, QoS, advanced tunnels, RSVP and Differentiated services, are just a few of the buzzword capabilities unleashed by the ip and tc programs.


Contents


Introduction

This is the second version of this document; the first began as a simple command reference for my own use and evolved into a slightly more complex document, including some usage notes and a bunch of links to documents. That version became dated rather rapidly, and so I've redone it from fresh sources. Hopefully the pace of development has slowed somewhat (at least on the user interface side of things), and this version will remain useful a bit longer.

Suggestions, additions, sources, comments, etc to . Specifically more detailed explanations of the commands listed in the  section, for inclusion, would be extremely welcome.

All of the information pulled from linux/ files comes from linux-2.2.14-pre9. The iproute2/ files come from the iproute2-2.2.4-now-ss991023.tar.gz distribution.


Distribution

From the Documentation/Configure.help file of the linux distribution; "" item:

To administer these schedulers, you'll need the user-level utilities  from the package iproute2+tc at

That directory is mirrored at:

  •  (STM1 to USA)
  • It's probably worth picking up the latest version; if your distribution includes these tools at all they're likely to be a revision or two behind.

    Building

    Unpack the distribution tarball. Edit the Makefile; you need to tell it where to find the kernel include files with the KERNEL_INCLUDES variable. You might also need to tell it about libraries or something; I didn't have to. A simple "make all" compiles the programs ip/ipip/rtmon, and tc/tc. I didn't get but a couple of warning messages during the compile. Copy those programs to /sbin, and you're ready to play.

    Keep the unpacked distribution, it's got several examples and utility scripts which may well be useful, especially if you're going to be using the tc program to do QoS stuff. In that case, you may also need to edit a kernel header file. In the tc/ directory of the distribution there's a file named "README.last", that explains the possible values and meanings of PSCHED_CLOCK_SOURCE in the file linux/include/net/pkt_sched.h. The default "PSCHED_JIFFIES" there is probably what you want anyway.

    Kernel Configuration

    Most of the capabilities of the kernel accessed by the ip and tc programs probably aren't included in your distribution's stock kernel configuration. If you can't get a feature to work, make sure that all of the required kernel configuration values are set properly. I've tried to note what kernel configuration options are required or related in the  section. I'm often guessing, so don't be suprised if I've missed something (please do tell me about it, though).


    Documentation

    If you can't find the answers to your questions in this document or in the documentation listed below, you might try searching through the  mailing list and  mailing list archives and/or asking the habitants of those lists.

    from the Distribution

    In the iproute2/ directory you'll find the  file, which contains build instructions; the  file, and the  file, which discusses the tc Traffic Control tools. The  file contains some points relevant to DECnet. There's a  file in the iproute2/tc/ directory that discusses the clock source for the QoS code; it's a bit dated but it explains what the options are.

    In the iproute2/doc/ directory are several documents; the  (, , and );  (, , and ); and  (, , and ).

    from the Kernel

    The kernel's Documentation/networking/ has a couple of files; the  file, and the  file. These are slim and outdated, but they introduce a couple of important concepts. I've got a  file, mostly networking options, that should be a little easier to read than via make config.

    from everywhere else

    Timur A. Bolokhov has started an  (, and ), which ain't bad at all. The date on those files from the original site () was Sep 29 1998; they might be a little dated, but not so much so as the kernel docs.

     

    Saravanan Radhakrishan has put together a fine collection of links and documentation at the , dealing with the implementation details of the QoS system and how to attack it from C via netlink. I haven't read it all yet but it should be helpful. Specifically see the  () and the ().

    The  file mentions "Terminology and advices about setting CBQ parameters may be found in Sally Floyd papers." Some of those can be found at . She 's also got pages listing resources for , and . Programmers looking to interface with the kernel side of the new networking capabilities may find some joy in Andi Kleen's .

    Juergen Jaeschke pointed me to the  Internet Draft, which contains some good discussion of the  traffic control tools.

    Werner Almesberger has posted the slides he used for his presentations at the Singapore Linux conference '99 and the Linux Expo '99; these are PostScript files. I've got copies of both,  slides from Linux Expo, and the  slides from the Singapore conference. He also has an excellent  paper on his site (which I've copied).

    I've collected some  that discuss the usage of these tools. One of the mail list messages responding to a request for information on the traffic control toys said "see Cisco Docs"; the  and most specifically the  are likely to be helpful.

    Open Resource has an , which includes a some discussion of the new networking.


    Command Syntax


    ip command

    From file iproute2/ip/ip.c

    Usage: ip [ OPTIONS ] OBJECT { COMMAND | help }where  OBJECT := { link | addr | route | rule | neigh | tunnel |                   maddr | mroute | monitor }       OPTIONS := { -V[ersion] | -s[tatistics] | -r[esolve] |                    -f[amily] { inet | inet6 | dnet | link } | -o[neline] }

    See the .

    If it's called as "ipaddr", "iproute", "iprule", "ipneigh", "iplink", "iptunnel", or "ipmonitor" it runs the appropriate subcommand. So you can create a link to "iproute", and use:

    # iproute (whatever)
    instead of:
    # ip route (whatever)

    ip link

    From file iproute2/ip/iplink.c

    Usage: ip link set DEVICE { up | down | arp { on | off } |	                     dynamic { on | off } |	                     multicast { on | off } | txqueuelen PACKETS |	                     name NEWNAME |	                     address LLADDR | broadcast LLADDR |	                     mtu MTU }       ip link show [ DEVICE ]

    See the .


    ip address

    From file iproute2/ip/ipaddress.c

    Usage: ip addr {add|del} IFADDR dev STRING       ip addr {show|flush} [ dev STRING ] [ scope SCOPE-ID ]                            [ to PREFIX ] [ FLAG-LIST ] [ label PATTERN ]IFADDR := PREFIX | ADDR peer PREFIX          [ broadcast ADDR ] [ anycast ADDR ]          [ label STRING ] [ scope SCOPE-ID ]SCOPE-ID := [ host | link | global | NUMBER ]FLAG-LIST := [ FLAG-LIST ] FLAGFLAG  := [ permanent | dynamic | secondary | primary |           tentative | deprecated ]

    See the .


    ip neighbor

    From file iproute2/ip/ipneigh.c

    Usage: ip neigh { add | del | change | replace } { ADDR [ lladdr LLADDR ]          [ nud { permanent | noarp | stale | reachable } ]          | proxy ADDR } [ dev DEV ]       ip neigh {show|flush} [ to PREFIX ] [ dev DEV ] [ nud STATE ]

    See the .


    ip rule

    Kernel Config: , , , , 

    From file iproute2/ip/iprule.c

    Usage: ip rule [ list | add | del ] SELECTOR ACTIONSELECTOR := [ from PREFIX ] [ to PREFIX ] [ tos TOS ] [ fwmark FWMARK ]            [ dev STRING ] [ pref NUMBER ]ACTION := [ table TABLE_ID ] [ nat ADDRESS ]          [ prohibit | reject | unreachable ]          [ realms [SRCREALM/]DSTREALM ]TABLE_ID := [ local | main | default | NUMBER ]

    See the .


    ip route

    Kernel Config: , , , , 

    From file iproute2/ip/iproute.c

    Usage: ip route { list | flush } SELECTOR       ip route get ADDRESS [ from ADDRESS iif STRING ]                            [ oif STRING ]  [ tos TOS ]       ip route { add | del | replace | change | append | replace | monitor } ROUTESELECTOR := [ root PREFIX ] [ match PREFIX ] [ exact PREFIX ]            [ table TABLE_ID ] [ proto RTPROTO ]            [ type TYPE ] [ scope SCOPE ]ROUTE := NODE_SPEC [ INFO_SPEC ]NODE_SPEC := [ TYPE ] PREFIX [ tos TOS ]             [ table TABLE_ID ] [ proto RTPROTO ]             [ scope SCOPE ] [ metric METRIC ]INFO_SPEC := NH OPTIONS FLAGS [ nexthop NH ]...NH := [ via ADDRESS ] [ dev STRING ] [ weight NUMBER ] NHFLAGSOPTIONS := FLAGS [ mtu NUMBER ] [ advmss NUMBER ]           [ rtt NUMBER ] [ rttvar NUMBER ]           [ window NUMBER] [ cwnd NUMBER ] [ ssthresh REALM ]           [ realms REALM ]TYPE := [ unicast | local | broadcast | multicast | throw |          unreachable | prohibit | blackhole | nat ]TABLE_ID := [ local | main | default | all | NUMBER ]SCOPE := [ host | link | global | NUMBER ]FLAGS := [ equalize ]NHFLAGS := [ onlink | pervasive ]RTPROTO := [ kernel | boot | static | NUMBER ]

    See the .


    ip tunnel

    Kernel Config: , , 

    From file iproute2/ip/iptunnel.c

    Usage: ip tunnel { add | change | del | show } [ NAME ]          [ mode { ipip | gre | sit } ] [ remote ADDR ] [ local ADDR ]          [ [i|o]seq ] [ [i|o]key KEY ] [ [i|o]csum ]          [ ttl TTL ] [ tos TOS ] [ [no]pmtudisc ] [ dev PHYS_DEV ]Where: NAME := STRING       ADDR := { IP_ADDRESS | any }       TOS  := { NUMBER | inherit }       TTL  := { 1..255 | inherit }       KEY  := { DOTTED_QUAD | NUMBER }

    See the , also see  and my (old) .


    ip maddress

    From file iproute2/ip/ipmaddr.c

    Usage: ip maddr [ add | del ] MULTIADDR dev STRING       ip maddr show [ dev STRING ]

    See the .


    ip mroute

    From file iproute2/ip/ipmroute.c

    Usage: ip mroute show [ PREFIX ] [ from PREFIX ] [ iif DEVICE ]

    See the .


    ip monitor

    Kernel Config: 

    From file iproute2/ip/ipmonitor.c

    Usage: ip monitor [ all | LISTofOBJECTS ]

    See the .


    rtmon command

    Kernel Config: 

    From file iproute2/ip/rtmon.c

    Usage: rtmon file FILE [ all | LISTofOBJECTS]LISTofOBJECTS := [ link ] [ address ] [ route ]

    tc command

    Kernel Config: , 

    From file iproute2/tc/tc.c

    Usage: tc [ OPTIONS ] OBJECT { COMMAND | help }where  OBJECT := { qdisc | class | filter }       OPTIONS := { -s[tatistics] | -d[etails] | -r[aw] }

    Where it's expecting a number for BPS; it understands some suffixes: kbps (*1024), mbps (*1024*1024), kbit (*1024/8), and mbit (*1024*1024/8). If I'm reading the code correctly; "BPS" means Bytes Per Second; if you give a number without a suffix it assumes you want BITS per second (it divides the number you give it by 8). It also understands bps as a suffix.

    Where it's expecting a time value, it seems it understands suffixes of ssec, and secs for seconds, msmsec, and msecs for milliseconds, and ususec, and usecs for microseconds.

    Where it wants a size parameter, it assumes non-suffixed numbers to be specified in bytes. It also understands suffixes of k and kb to mean kilobytes (*1024), m and mb to mean megabytes (*1024*1024), kbit to mean kilobit (*1024/8), and mbit to mean megabits (*1024*1024/8).

    1Mbit == 128Kbps


    qdisc

    Kernel Config: , 

    From file iproute2/tc/tc_qdisc.c

    Usage: tc qdisc [ add | del | replace | change | get ] dev STRING       [ handle QHANDLE ] [ root | ingress | parent CLASSID ]       [ estimator INTERVAL TIME_CONSTANT ]       [ [ QDISC_KIND ] [ help | OPTIONS ] ]       tc qdisc show [ dev STRING ] [ingress]Where:QDISC_KIND := { [p|b]fifo | tbf | prio | cbq | red | etc. }OPTIONS := ... try tc qdisc add 
    help

    From file linux/net/sched/sch_api.c

    Short review.   -------------   This file consists of two interrelated parts:   1. queueing disciplines manager frontend.   2. traffic classes manager frontend.   Generally, queueing discipline ("qdisc") is a black box,   which is able to enqueue packets and to dequeue them (when   device is ready to send something) in order and at times   determined by algorithm hidden in it.   qdisc's are divided to two categories:   - "queues", which have no internal structure visible from outside.   - "schedulers", which split all the packets to "traffic classes",     using "packet classifiers" (look at cls_api.c)   In turn, classes may have child qdiscs (as rule, queues)   attached to them etc. etc. etc.   The goal of the routines in this file is to translate   information supplied by user in the form of handles   to more intelligible for kernel form, to make some sanity   checks and part of work, which is common to all qdiscs   and to provide rtnetlink notifications.   All real intelligent work is done inside qdisc modules.   Every discipline has two major routines: enqueue and dequeue.   ---dequeue   dequeue usually returns a skb to send. It is allowed to return NULL,   but it does not mean that queue is empty, it just means that   discipline does not want to send anything this time.   Queue is really empty if q->q.qlen == 0.   For complicated disciplines with multiple queues q->q is not   real packet queue, but however q->q.qlen must be valid.   ---enqueue   enqueue returns number of enqueued packets i.e. this number is 1,   if packet was enqueued successfully and <1 if something (not   necessary THIS packet) was dropped.   Auxiliary routines:   ---requeue   requeues once dequeued packet. It is used for non-standard or   just buggy devices, which can defer output even if dev->tbusy=0.   ---reset   returns qdisc to initial state: purge all buffers, clear all   timers, counters (except for statistics) etc.   ---init   initializes newly created qdisc.   ---destroy   destroys resources allocated by init and during lifetime of qdisc.   ---change   changes qdisc parameters.

    qdisc [p|b]fifo

    From file iproute2/tc/q_fifo.c

    Usage: ... [p|b]fifo [ limit NUMBER ]

    From file linux/net/sched/sch_fifo.c

    /* 1 band FIFO pseudo-"scheduler" */

    qdisc tbf

    Kernel Config: , , , 

    From file iproute2/tc/

    Usage: ... tbf limit BYTES burst BYTES[/BYTES] rate KBPS [ mtu BYTES[/BYTES] ]               [ peakrate KBPS ] [ latency TIME ]

    From file linux/net/sched/sch_tbf.c

    /*	Simple Token Bucket Filter.	=======================================	SOURCE.	-------	None.	Description.	------------	A data flow obeys TBF with rate R and depth B, if for any	time interval t_i...t_f the number of transmitted bits	does not exceed B + R*(t_f-t_i).	Packetized version of this definition:	The sequence of packets of sizes s_i served at moments t_i	obeys TBF, if for any i<=k:	s_i+....+s_k <= B + R*(t_k - t_i)	Algorithm.	----------		Let N(t_i) be B/R initially and N(t) grow continuously with time as:	N(t+delta) = min{B/R, N(t) + delta}	If the first packet in queue has length S, it may be	transmited only at the time t_* when S/R <= N(t_*),	and in this case N(t) jumps:	N(t_* + 0) = N(t_* - 0) - S/R.	Actually, QoS requires two TBF to be applied to a data stream.	One of them controls steady state burst size, another	one with rate P (peak rate) and depth M (equal to link MTU)	limits bursts at a smaller time scale.	It is easy to see that P>R, and B>M. If P is infinity, this double	TBF is equivalent to a single one.	When TBF works in reshaping mode, latency is estimated as:	lat = max ((L-B)/R, (L-M)/P)	NOTES.	------	If TBF throttles, it starts a watchdog timer, which will wake it up	when it is ready to transmit.	Note that the minimal timer resolution is 1/HZ.	If no new packets arrive during this period,	or if the device is not awaken by EOI for some previous packet,	TBF can stop its activity for 1/HZ.	This means, that with depth B, the maximal rate is	R_crit = B*HZ	F.e. for 10Mbit ethernet and HZ=100 the minimal allowed B is ~10Kbytes.	Note that the peak rate TBF is much more tough: with MTU 1500	P_crit = 150Kbytes/sec. So, if you need greater peak	rates, use alpha with HZ=1000 :-)*/

    qdisc cbq

    Kernel Config: , , , 

    From file iproute2/tc/q_cbq.c

    Usage: ... cbq bandwidth BPS avpkt BYTES [ mpu BYTES ]               [ cell BYTES ] [ ewma LOG ]

    From file linux/net/sched/sch_cbq.c

    /*	Class-Based Queueing (CBQ) algorithm.	=======================================	Sources: [1] Sally Floyd and Van Jacobson, "Link-sharing and Resource	         Management Models for Packet Networks",		 IEEE/ACM Transactions on Networking, Vol.3, No.4, 1995	         [2] Sally Floyd, "Notes on CBQ and Guaranted Service", 1995	         [3] Sally Floyd, "Notes on Class-Based Queueing: Setting		 Parameters", 1996		 [4] Sally Floyd and Michael Speer, "Experimental Results		 for Class-Based Queueing", 1998, not published.	-----------------------------------------------------------------------	Algorithm skeleton was taken from NS simulator cbq.cc.	If someone wants to check this code against the LBL version,	he should take into account that ONLY the skeleton was borrowed,	the implementation is different. Particularly:	--- The WRR algorithm is different. Our version looks more        reasonable (I hope) and works when quanta are allowed to be        less than MTU, which is always the case when real time classes        have small rates. Note, that the statement of [3] is        incomplete, delay may actually be estimated even if class        per-round allotment is less than MTU. Namely, if per-round        allotment is W*r_i, and r_1+...+r_k = r < 1	delay_i <= ([MTU/(W*r_i)]*W*r + W*r + k*MTU)/B	In the worst case we have IntServ estimate with D = W*r+k*MTU	and C = MTU*r. The proof (if correct at all) is trivial.	--- It seems that cbq-2.0 is not very accurate. At least, I cannot	interpret some places, which look like wrong translations	from NS. Anyone is advised to find these differences	and explain to me, why I am wrong 8).	--- Linux has no EOI event, so that we cannot estimate true class	idle time. Workaround is to consider the next dequeue event	as sign that previous packet is finished. This is wrong because of	internal device queueing, but on a permanently loaded link it is true.	Moreover, combined with clock integrator, this scheme looks	very close to an ideal solution.  */

    tristate 'CBQ packet scheduler' CONFIG_NET_SCH_CBQ


    qdisc red

    Kernel Config: , , 

    From file iproute2/tc/q_red.c

    Usage: ... red limit BYTES min BYTES max BYTES avpkt BYTES burst PACKETS               probability PROBABILITY bandwidth KBPS

    From file linux/net/sched/sch_red.c

    /*	Random Early Detection (RED) algorithm.	=======================================	Source: Sally Floyd and Van Jacobson, "Random Early Detection Gateways	for Congestion Avoidance", 1993, IEEE/ACM Transactions on Networking.	This file codes a "divisionless" version of RED algorithm	as written down in Fig.17 of the paper.Short description.------------------	When a new packet arrives we calculate the average queue length:	avg = (1-W)*avg + W*current_queue_len,	W is the filter time constant (choosen as 2^(-Wlog)), it controls	the inertia of the algorithm. To allow larger bursts, W should be	decreased.	if (avg > th_max) -> packet marked (dropped).	if (avg < th_min) -> packet passes.	if (th_min < avg < th_max) we calculate probability:	Pb = max_P * (avg - th_min)/(th_max-th_min)	and mark (drop) packet with this probability.	Pb changes from 0 (at avg==th_min) to max_P (avg==th_max).	max_P should be small (not 1), usually 0.01..0.02 is good value.	max_P is chosen as a number, so that max_P/(th_max-th_min)	is a negative power of two in order arithmetics to contain	only shifts.	Parameters, settable by user:	-----------------------------	limit		- bytes (must be > qth_max + burst)	Hard limit on queue length, should be chosen >qth_max	to allow packet bursts. This parameter does not	affect the algorithms behaviour and can be chosen	arbitrarily high (well, less than ram size)	Really, this limit will never be reached	if RED works correctly.	qth_min		- bytes (should be < qth_max/2)	qth_max		- bytes (should be at least 2*qth_min and less limit)	Wlog	       	- bits (<32) log(1/W).	Plog	       	- bits (<32)	Plog is related to max_P by formula:	max_P = (qth_max-qth_min)/2^Plog;	F.e. if qth_max=128K and qth_min=32K, then Plog=22	corresponds to max_P=0.02	Scell_log	Stab	Lookup table for log((1-W)^(t/t_ave).NOTES:Upper bound on W.-----------------	If you want to allow bursts of L packets of size S,	you should choose W:	L + 1 - th_min/S < (1-(1-W)^L)/W	th_min/S = 32         th_min/S = 4			                       	log(W)	L	-1	33	-2	35	-3	39	-4	46	-5	57	-6	75	-7	101	-8	135	-9	190	etc. */

    qdisc sfq

    From file iproute2/tc/q_sfq.c

    Usage: ... sfq [ perturb SECS ] [ quantum BYTES ]

    Kernel Config: , , 

    From file linux/net/sched/

    /*	Stochastic Fairness Queuing algorithm.	=======================================	Source:	Paul E. McKenney "Stochastic Fairness Queuing",	IEEE INFOCOMM'90 Proceedings, San Francisco, 1990.	Paul E. McKenney "Stochastic Fairness Queuing",	"Interworking: Research and Experience", v.2, 1991, p.113-131.	See also:	M. Shreedhar and George Varghese "Efficient Fair	Queuing using Deficit Round Robin", Proc. SIGCOMM 95.	This is not the thing that is usually called (W)FQ nowadays. 	It does not use any timestamp mechanism, but instead	processes queues in round-robin order.	ADVANTAGE:	- It is very cheap. Both CPU and memory requirements are minimal.	DRAWBACKS:	- "Stochastic" -> It is not 100% fair. 	When hash collisions occur, several flows are considered as one.	- "Round-robin" -> It introduces larger delays than virtual clock	based schemes, and should not be used for isolating interactive	traffic	from non-interactive. It means, that this scheduler	should be used as leaf of CBQ or P3, which put interactive traffic	to higher priority band.	We still need true WFQ for top level CSZ, but using WFQ	for the best effort traffic is absolutely pointless:	SFQ is superior for this purpose.	IMPLEMENTATION:	This implementation limits maximal queue length to 128;	maximal mtu to 2^15-1; number of hash buckets to 1024.	The only goal of this restrictions was that all data	fit into one 4K page :-). Struct sfq_sched_data is	organized in anti-cache manner: all the data for a bucket	are scattered over different locations. This is not good,	but it allowed me to put it into 4K.	It is easy to increase these values, but not in flight.  */

    qdisc prio

    Kernel Config: , , 

    From file iproute2/tc/q_prio.c

    Usage: ... prio bands NUMBER priomap P1 P2...

    From file linux/net/sched/sch_prio.c

    * net/sched/sch_prio.c	Simple 3-band priority "scheduler".

    qdisc csz

    Kernel Config: , , 

    From file iproute2/tc/q_csz.c

    NOT IMPLEMENTED

    From file linux/net/sched/sch_csz.c

    /*	Clark-Shenker-Zhang algorithm.	=======================================	SOURCE.	David D. Clark, Scott Shenker and Lixia Zhang	"Supporting Real-Time Applications in an Integrated Services Packet	Network: Architecture and Mechanism".	CBQ presents a flexible universal algorithm for packet scheduling,	but it has pretty poor delay characteristics.	Round-robin scheduling and link-sharing goals	apparently contradict minimization of network delay and jitter.	Moreover, correct handling of predictive flows seems to be	impossible in CBQ.	CSZ presents a more precise but less flexible and less efficient	approach. As I understand it, the main idea is to create	WFQ flows for each guaranteed service and to allocate	the rest of bandwith to dummy flow-0. Flow-0 comprises	the predictive services and the best effort traffic;	it is handled by a priority scheduler with the highest	priority band allocated	for predictive services, and the rest ---	to the best effort packets.	Note that in CSZ flows are NOT limited to their bandwidth.  It	is supposed that the flow passed admission control at the edge	of the QoS network and it doesn't need further shaping. Any	attempt to improve the flow or to shape it to a token bucket	at intermediate hops will introduce undesired delays and raise	jitter.	At the moment CSZ is the only scheduler that provides	true guaranteed service. Another schemes (including CBQ)	do not provide guaranteed delay and randomize jitter.	There is a proof (Sally Floyd), that delay	can be estimated by a IntServ compliant formula.	This result is true formally, but it is wrong in principle.	It takes into account only round-robin delays,	ignoring delays introduced by link sharing i.e. overlimiting.	Note that temporary overlimits are inevitable because	real links are not ideal, and the real algorithm must take this	into account.        ALGORITHM.	--- Notations.	$B$ is link bandwidth (bits/sec).	$I$ is set of all flows, including flow $0$.	Every flow $a \in I$ has associated bandwidth slice $r_a < 1$ and	$\sum_{a \in I} r_a = 1$.	--- Flow model.	Let $m_a$ is the number of backlogged bits in flow $a$.	The flow is {\em active}, if $m_a > 0$.	This number is a discontinuous function of time;	when a packet $i$ arrives:	\[	m_a(t_i+0) - m_a(t_i-0) = L^i,	\]	where $L^i$ is the length of the arrived packet.	The flow queue is drained continuously until $m_a == 0$:	\[	{d m_a \over dt} = - { B r_a \over \sum_{b \in A} r_b}.	\]	I.e. flow rates are their allocated rates proportionally	scaled to take all available link bandwidth. Apparently,	it is not the only possible policy. F.e. CBQ classes	without borrowing would be modelled by:	\[	{d m_a \over dt} = - B r_a .	\]	More complicated hierarchical bandwidth allocation	policies are possible, but unfortunately, the basic	flow equations have a simple solution only for proportional	scaling.	--- Departure times.	We calculate the time until the last bit of packet is sent:	\[	E_a^i(t) = { m_a(t_i) - \delta_a(t) \over r_a },	\]	where $\delta_a(t)$ is number of bits drained since $t_i$.	We have to evaluate $E_a^i$ for all queued packets,	then find the packet with minimal $E_a^i$ and send it.	This sounds good, but direct implementation of the algorithm	is absolutely infeasible. Luckily, if flow rates	are scaled proportionally, the equations have a simple solution.		The differential equation for $E_a^i$ is	\[	{d E_a^i (t) \over dt } = - { d \delta_a(t) \over dt} { 1 \over r_a} =	{ B \over \sum_{b \in A} r_b}	\]	with initial condition	\[	E_a^i (t_i) = { m_a(t_i) \over r_a } .	\]	Let's introduce an auxiliary function $R(t)$:	--- Round number.	Consider the following model: we rotate over active flows,	sending $r_a B$ bits from every flow, so that we send	$B \sum_{a \in A} r_a$ bits per round, that takes	$\sum_{a \in A} r_a$ seconds.		Hence, $R(t)$ (round number) is a monotonically increasing	linear function	of time when $A$ is not changed	\[	{ d R(t) \over dt } = { 1 \over \sum_{a \in A} r_a }	\]	and it is continuous when $A$ changes.	The central observation is that the quantity	$F_a^i = R(t) + E_a^i(t)/B$ does not depend on time at all!	$R(t)$ does not depend on flow, so that $F_a^i$ can be	calculated only once on packet arrival, and we need not	recalculate $E$ numbers and resorting queues.	The number $F_a^i$ is called finish number of the packet.	It is just the value of $R(t)$ when the last bit of packet	is sent out.	Maximal finish number on flow is called finish number of flow	and minimal one is "start number of flow".	Apparently, flow is active if and only if $F_a \leq R$.	When a packet of length $L_i$ bit arrives to flow $a$ at time $t_i$,	we calculate $F_a^i$ as:	If flow was inactive ($F_a < R$):	$F_a^i = R(t) + {L_i \over B r_a}$	otherwise	$F_a^i = F_a + {L_i \over B r_a}$	These equations complete the algorithm specification.	It looks pretty hairy, but there is a simple	procedure for solving these equations.	See procedure csz_update(), that is a generalization of	the algorithm from S. Keshav's thesis Chapter 3	"Efficient Implementation of Fair Queeing".	NOTES.	* We implement only the simplest variant of CSZ,	when flow-0 is a explicit 4band priority fifo.	This is bad, but we need a "peek" operation in addition	to "dequeue" to implement complete CSZ.	I do not want to do that, unless it is absolutely	necessary.		* A primitive support for token bucket filtering	presents itself too. It directly contradicts CSZ, but	even though the Internet is on the globe ... :-)	"the edges of the network" really exist.		BUGS.	* Fixed point arithmetic is overcomplicated, suboptimal and even	wrong. Check it later.  *//* This number is arbitrary */

    class

    Kernel Config: , , 

    From file iproute2/tc/tc_class.c

    Usage: tc class [ add | del | change | get ] dev STRING       [ classid CLASSID ] [ root | parent CLASSID ]       [ [ QDISC_KIND ] [ help | OPTIONS ] ]       tc class show [ dev STRING ] [ root | parent CLASSID ]Where:QDISC_KIND := { prio | cbq | etc. }OPTIONS := ... try tc class add 
    help

    class cbq

    From file iproute2/tc/q_cbq.c

    Usage: ... cbq bandwidth BPS rate BPS maxburst PKTS [ avpkt BYTES ]               [ minburst PKTS ] [ bounded ] [ isolated ]               [ allot BYTES ] [ mpu BYTES ] [ weight RATE ]               [ prio NUMBER ] [ cell BYTES ] [ ewma LOG ]               [ estimator INTERVAL TIME_CONSTANT ]               [ split CLASSID ] [ defmap MASK/CHANGE ]

    estimator

    Kernel Config: , , 

    From file iproute2/tc/m_estimator.c

    Usage: ... estimator INTERVAL TIME-CONST  INTERVAL is interval between measurements  TIME-CONST is averaging time constantExample: ... est 1sec 8sec\

    From file linux/net/sched/estimator.c

    This code is NOT intended to be used for statistics collection,   its purpose is to provide a base for statistical multiplexing   for controlled load service.   If you need only statistics, run a user level daemon which   periodically reads byte counters.   Unfortunately, rate estimation is not a very easy task.   F.e. I did not find a simple way to estimate the current peak rate   and even failed to formulate the problem 8)8)   So I preferred not to built an estimator into the scheduler,   but run this task separately.   Ideally, it should be kernel thread(s), but for now it runs   from timers, which puts apparent top bounds on the number of rated   flows, has minimal overhead on small, but is enough   to handle controlled load service, sets of aggregates.   We measure rate over A=(1<

    filter

    Kernel Config: , , 

    From file iproute2/tc/tc_filter.c

    Usage: tc filter [ add | del | change | get ] dev STRING       [ pref PRIO ] [ protocol PROTO ]       [ estimator INTERVAL TIME_CONSTANT ]       [ root | classid CLASSID ] [ handle FILTERID ]       [ [ FILTER_TYPE ] [ help | OPTIONS ] ]       tc filter show [ dev STRING ] [ root | parent CLASSID ]Where:FILTER_TYPE := { rsvp | u32 | fw | route | etc. }FILTERID := ... format depends on classifier, see thereOPTIONS := ... try tc filter add 
    help

    filter rsvp

    From file iproute2/tc/f_rsvp.c

    Usage: ... rsvp ipproto PROTOCOL session DST[/PORT | GPI ]                [ sender SRC[/PORT | GPI ]                [ classid CLASSID ] [ police POLICE_SPEC ]                [ tunnelid ID ] [ tunnel ID skip NUMBER ]Where: GPI := { flowlabel NUMBER | spi/ah SPI | spi/esp SPI |                u{8|16|32} NUMBER mask MASK at OFFSET}       POLICE_SPEC := ... look at TBF       FILTERID := X:Y

    From file linux/net/sched/cls_rsvp.h

    /*   Comparing to general packet classification problem,   RSVP needs only sevaral relatively simple rules:   * (dst, protocol) are always specified,     so that we are able to hash them.   * src may be exact, or may be wildcard, so that     we can keep a hash table plus one wildcard entry.   * source port (or flow label) is important only if src is given.   IMPLEMENTATION.   We use a two level hash table: The top level is keyed by   destination address and protocol ID, every bucket contains a list   of "rsvp sessions", identified by destination address, protocol and   DPI(="Destination Port ID"): triple (key, mask, offset).   Every bucket has a smaller hash table keyed by source address   (cf. RSVP flowspec) and one wildcard entry for wildcard reservations.   Every bucket is again a list of "RSVP flows", selected by   source address and SPI(="Source Port ID" here rather than   "security parameter index"): triple (key, mask, offset).   NOTE 1. All the packets with IPv6 extension headers (but AH and ESP)   and all fragmented packets go to the best-effort traffic class.   NOTE 2. Two "port id"'s seems to be redundant, rfc2207 requires   only one "Generalized Port Identifier". So that for classic   ah, esp (and udp,tcp) both *pi should coincide or one of them   should be wildcard.   At first sight, this redundancy is just a waste of CPU   resources. But DPI and SPI add the possibility to assign different   priorities to GPIs. Look also at note 4 about tunnels below.   NOTE 3. One complication is the case of tunneled packets.   We implement it as following: if the first lookup   matches a special session with "tunnelhdr" value not zero,   flowid doesn't contain the true flow ID, but the tunnel ID (1...255).   In this case, we pull tunnelhdr bytes and restart lookup   with tunnel ID added to the list of keys. Simple and stupid 8)8)   It's enough for PIMREG and IPIP.   NOTE 4. Two GPIs make it possible to parse even GRE packets.   F.e. DPI can select ETH_P_IP (and necessary flags to make   tunnelhdr correct) in GRE protocol field and SPI matches   GRE key. Is it not nice? 8)8)   Well, as result, despite its simplicity, we get a pretty   powerful classification engine.  */

    filter u32

    From file iproute2/tc/f_u32.c

    Usage: ... u32 [ match SELECTOR ... ] [ link HTID ] [ classid CLASSID ]               [ police POLICE_SPEC ] [ offset OFFSET_SPEC ]               [ ht HTID ] [ hashkey HASHKEY_SPEC ]               [ sample SAMPLE ]or         u32 divisor DIVISORWhere: SELECTOR := SAMPLE SAMPLE ...       SAMPLE := { ip | ip6 | udp | tcp | icmp | u{32|16|8} } SAMPLE_ARGS       FILTERID := X:Y:Z

    From file linux/net/sched/cls_u32.c

    *	The filters are packed to hash tables of key nodes *	with a set of 32bit key/mask pairs at every node. *	Nodes reference next level hash tables etc. * *	This scheme is the best universal classifier I managed to *	invent; it is not super-fast, but it is not slow (provided you *	program it correctly), and general enough.  And its relative *	speed grows as the number of rules becomes larger. * *	It seems that it represents the best middle point between *	speed and manageability both by human and by machine. * *	It is especially useful for link sharing combined with QoS; *	pure RSVP doesn't need such a general approach and can use *	much simpler (and faster) schemes, sort of cls_rsvp.c.

    Faster than ipchains, according to Werner Almesburger (as reported in , May 22 1999).


    filter fw

    From file iproute2/tc/f_fw.c

    Usage: ... fw [ classid CLASSID ] [ police POLICE_SPEC ]       POLICE_SPEC := ... look at TBF       CLASSID := X:Y

    From file linux/net/sched/cls_fw.c

    * net/sched/cls_fw.c	Classifier mapping ipchains' fwmark to traffic class.

    filter route

    From file iproute2/tc/f_route.c

    Usage: ... route [ from REALM | fromif TAG ] [ to REALM ]                [ flowid CLASSID ] [ police POLICE_SPEC ]       POLICE_SPEC := ... look at TBF       CLASSID := X:Y\n

    From file linux/net/sched/cls_route.c

    1. For now we assume that route tags < 256.      It allows to use direct table lookups, instead of hash tables.   2. For now we assume that "from TAG" and "fromdev DEV" statements      are mutually  exclusive.   3. "to TAG from ANY" has higher priority, than "to ANY from XXX"

    police

    From file iproute2/tc/m_police.c

    Usage: ... police rate BPS burst BYTES[/BYTES] [ mtu BYTES[/BYTES] ]                [ peakrate BPS ] [ avrate BPS ]                [ ACTION ]Where: ACTION := reclassify | drop | continue

     


    Copyright © 1999 Mark Lamb. Redistribution via any media permitted as long as this notice remains intact. 

转载于:https://www.cnblogs.com/huangguanyuan/p/10498933.html

你可能感兴趣的文章
[转]Paul Adams:为社交设计
查看>>
showdialog弹出窗口刷新问题
查看>>
java
查看>>
Vue.js连接后台数据jsp页面  ̄▽ ̄
查看>>
关于程序的单元测试
查看>>
mysql内存优化
查看>>
都市求生日记第一篇
查看>>
Java集合---HashMap源码剖析
查看>>
SQL优化技巧
查看>>
thead 固定,tbody 超出滚动(附带改变滚动条样式)
查看>>
Dijkstra算法
查看>>
css 动画 和 响应式布局和兼容性
查看>>
csrf 跨站请求伪造相关以及django的中间件
查看>>
MySQL数据类型--与MySQL零距离接触2-11MySQL自动编号
查看>>
生日小助手源码运行的步骤
查看>>
Configuration python CGI in XAMPP in win-7
查看>>
bzoj 5006(洛谷 4547) [THUWC2017]Bipartite 随机二分图——期望DP
查看>>
CF 888E Maximum Subsequence——折半搜索
查看>>
欧几里德算法(辗转相除法)
查看>>
面试题1-----SVM和LR的异同
查看>>