DragonFly net will be down for a little while today to renumber
dillon at apollo.backplane.com
Tue Feb 12 09:39:18 PST 2013
: I am moving the DragonFly network to a new VPN (pointing at a new colo)
: to deal with the IP routing issues people are having.
: I will attempt to get the boxes up on both VPNs first before I tear
: down the old one (in a few days) but there may be network outages anyway.
Here's an update. I was forced to scrap the old VPN, the dual-homed
setup was just not stable enough and I was having problems with loops
in the bridging code when I tried to integrate Avalon into the new
network (Avalon physically sits in the old colo). I was also having
a lot of trouble getting the default route to work properly, since
there are actually three default routes that have to be messed around
with (10/8, OldIPBlock, NewIPBlock).
The network has now been entirely switched over to the new IP block
(199.233.90/24) homed in the new colo. The old IP block (69.163.100/24)
has been disconnected, though the machines in the old colo still use
some of the old IPs.
Basically our current set-up is:
San Jose colo - Avalon
Freemont colo - Kronos
My Home - Crater, Leaf, Monster, etc.
(dual-homed to both Comcast and AT&T U-Verse)
Latency between the colos is around ~1-2ms. Latency over Comcast
to Crater, Leaf, etc from the Freemont colo is ~25-40ms or so
The two colo's and Home are bridged using OpenVPN. However, due to
numerous issues I couldn't dual-home both IP blocks and I couldn't
complete a circle so the bridging is currently a Star pattern centered
on the Freemont colo.
A little later this year we will be buying a blade server to place in
the Freemont colo and all DragonFly infrastructure will be moved onto
* The if_bridge code couldn't handle the circle. There are bugs in
my STP code which caused issues. Basically I was trying to multi-home
two separate internet-routable IP networks on the same bridge in a
circular pattern and it got seriously confused.
* Generally speaking, multi-homed bridged machines can create serious
problems with bridges due to the fact that bridging is MAC-based and
not IP-based. So when one machine is multi-homed and has more
than one default route (going to different targets across the bridge),
and is handling packets for both targets simultaniously, the bridge can
get very seriously confused on how to bridge replies to the packet.
* Most DFly machines use a 10/8 network for their primary IP address
and internet/other IP addresses are aliases. They default to
NAT-out when not bound to their internet IPs. IPFW rules adjust
the default route when the packets are sourced/targetted at the
machine's internet-routable IP address.
This means that there are actually several default routes depending
on the source IP. The 10/8 network has its own default route which
points to the NAT box, and the routable IP block(s) have their own
default routes which generally point to a host across the bridge.
* Chicken-and-egg issues with IPFW and PF packet filter ordering.
Since neither supports reinjection of the packet (that's a feature
that PF sorely needs). IPFW comes first, then PF.
Basically the default route can be changed with IPFW prior to a
packet feeding into PF, but for some reason either cannot be changed
safely within PF or the change occurs in the wrong order relative
to other rules.
* Discovered a bug when combining a NAT rule with a change in the
default route. DragonFly wanted to send an ICMP IP Redirect for
the pre-NAT packet in addition to translating the packet and
transmitting a post-NAT packet. That created a bit of a mess.
(fortunately it can be turned off with a sysctl, but it's still
So, three not so wonderful glitches which hopefully we'll be able to work
on this year. We need better handling of multiple default routes,
we need to track down and fix the improper ICMP IP Redirect, and we
need to fix the if_bridge's STP protocol in multi-homed configurations.
There may be some more down-time this afternoon as I clean up the
<dillon at backplane.com>
More information about the Users