Any interest in updating the routing engine?

Joshua Coombs jcoombs at
Tue Oct 7 19:18:22 PDT 2003

> I think it will be useful, for example, have possibility to set
> ip-precedence field by firewall rules.
> Joshua, how do you see Cisco CEF in *bsd?

CEF impliments a two layer routing table format inplace of a base
single table, lookup on each packet schema similar to the existing
*bsd routing engine.  The CEF table is built by route lookups
against the master table whenever a new flow hits the router.  The
route is then 'cached' in the CEF table and stays until the flow
stops, or the route is invalidated by an external means, be it link
status or routing updates from an external/internal routing protocol
like OSPF.

Essentially, Cisco's have a concept of more than one route for a
given destination.  I'd like to pull that logic into *bsd.

----- Be warned, ugly paste from notes --------------

The routing table should be expanded to include the following

A route table
* 1 entry per route consisting of a src & netmask, dst & netmask,
next hop
 address, hops, metric, TTL, state flag
* Multiple entries for the same src & netmask/dst & netmask are
(IE you can have multiple 'default routes' defined)
* Interfaces generate route entries for their directly connected
 alias entries on interfaces don't however.
* When an interface is shutdown, all route entries in the route
cache and
route table using it are removed.
* For route table entries with a TTL other than -1 (infinite) a
table maintanence thread periodically decrements the TTL of entries
with a
dynamic state flag (a state flag of -1 indicates a static route, any
number indicates a dynamic route and the number is the max TTL for
the route),
and any entries that the media has gone down on.  When the TTL
expires, the
 route is removed, along with any route cache table entries that
match.  If
 the state flag is set to permanent, the TTL is incremented each
pass if
the media is up.  This prevents flapping interfaces from thrashing
the route
table, as well as giving established routes a chance to continue
should the
 interface stabilize.

A route cache table
* Each entry contains a src & netmask, dst & netmask, next hop
address, TTL
* This table is built dynamically by the routing thread as it
similar to a state table.  This keeps traffic between two ip's from
constantly shifting between interfaces.
* As the table fills, a separate route cache management thread
TTL, eliminates entries when the TTL expire, and aggregates entries
the src & netmask entries are adjacent and share a dst & netmask.
 is based on CIDR rules for netblocks)

Routing decisions are made through the following process

1) Examine the route cache, if the src & netmask/dst & netmask pair
covered by one of those entries, route via the specified address,
and reset
the TTL for the route cache entry.

2) Using the src & netmask/dst & netmask pair, a list of all
routes is built.

3) If there is only one route applicable, jump to step 10

4) Perform a congestion check, eliminate any routes that fail, if
routes fail, they are considered equal and left as candidates.

5) Compare scopes, the most specific route wins and you move to step

6) Compare hops, the lowest wins and you move on to step 10

7) Compare metrics, the lowest wins and you move on to step 10

8) Compare TTL, the one with the most life wins and you move on to
step 10

9) The first entry in the candidate route table wins.

10) Send the packet out the applicable route.

11) Add a route cache table entry for the src and destination, using
 for netmasks.

The congestion check is pretty straight forward.  If an interface
has a
bandwidth and a threshold set, the current bandwidth over a 5 min
window is
checked against the threshold value.  If it matches or exceeds, the
congestion check fails for that interface.  In a multihomed
environment, say cable
modem on one enet adapter, dsl on another, this will allow for two
routes, one out each.  New sessions will be established based on the
rules, pushing all sessions out one interface (without using
bgp/etc, hops,
 metrics, and TTL will be defaults unless specifically tweaked by
until it fails the congestion check.  New routes will then go over
the second
 interface until it to fails the congestion check, at which point,
normal rules apply.  Because of the route cache, even when an
interface fails
 a congestion check, routes already established over a given
persist, allowing say, an ftp session to continue over a link that
is suddenly
congested as it never had to do a congestion check.

The hops, metric, and TTL values in the route table can be set via
bgp, rip, etc daemons allowing routing to make more intelligent
In an environment where traditional routing exchange protocols are
not an
option, it would be feasible to have a userland daemon watch the
route cache
table, do a ping or traceroute to the destination, and add an
updated route
 to the route table with a user customized TTL, allowing one to get
best routes from two or more interfaces, without using traditional
exchange protocols with each provider.

This system introduces ALLOT of memory and cpu bloat for routing
into the
system, but I think the gains outweigh the loss especially when
the typical stats for current systems.

----- End Paste ------------------------

I think this can be implimented right now ontop of an untouched *bsd
externally, heck I might be able to do this in perl if I can figure
out the kernel hooks.  I also know this is NOT the ideal solution,
but I think it's a working start.

Joshua Coombs

More information about the Kernel mailing list