DragonFly net will be down for a little while today to renumber

Matthew Dillon dillon at apollo.backplane.com
Tue Feb 12 09:39:18 PST 2013


:    I am moving the DragonFly network to a new VPN (pointing at a new colo)
:    to deal with the IP routing issues people are having.
:
:    I will attempt to get the boxes up on both VPNs first before I tear
:    down the old one (in a few days) but there may be network outages anyway.

    Here's an update.  I was forced to scrap the old VPN, the dual-homed
    setup was just not stable enough and I was having problems with loops
    in the bridging code when I tried to integrate Avalon into the new
    network (Avalon physically sits in the old colo).  I was also having
    a lot of trouble getting the default route to work properly, since
    there are actually three default routes that have to be messed around
    with (10/8, OldIPBlock, NewIPBlock).

    The network has now been entirely switched over to the new IP block
    (199.233.90/24) homed in the new colo.  The old IP block (69.163.100/24)
    has been disconnected, though the machines in the old colo still use
    some of the old IPs.

    Basically our current set-up is:

	San Jose colo - Avalon
	Freemont colo - Kronos
	My Home       - Crater, Leaf, Monster, etc.
			(dual-homed to both Comcast and AT&T U-Verse)

	Latency between the colos is around ~1-2ms.  Latency over Comcast
	to Crater, Leaf, etc from the Freemont colo is ~25-40ms or so
	unloaded.

    The two colo's and Home are bridged using OpenVPN.  However, due to
    numerous issues I couldn't dual-home both IP blocks and I couldn't
    complete a circle so the bridging is currently a Star pattern centered
    on the Freemont colo.

    A little later this year we will be buying a blade server to place in
    the Freemont colo and all DragonFly infrastructure will be moved onto
    blades.

					----
				Problems Encountered

    * The if_bridge code couldn't handle the circle.  There are bugs in
      my STP code which caused issues.  Basically I was trying to multi-home
      two separate internet-routable IP networks on the same bridge in a
      circular pattern and it got seriously confused.

    * Generally speaking, multi-homed bridged machines can create serious
      problems with bridges due to the fact that bridging is MAC-based and
      not IP-based.  So when one machine is multi-homed and has more
      than one default route (going to different targets across the bridge),
      and is handling packets for both targets simultaniously, the bridge can
      get very seriously confused on how to bridge replies to the packet.

    * Most DFly machines use a 10/8 network for their primary IP address
      and internet/other IP addresses are aliases.  They default to
      NAT-out when not bound to their internet IPs.  IPFW rules adjust
      the default route when the packets are sourced/targetted at the
      machine's internet-routable IP address.

      This means that there are actually several default routes depending
      on the source IP.  The 10/8 network has its own default route which
      points to the NAT box, and the routable IP block(s) have their own
      default routes which generally point to a host across the bridge.

    * Chicken-and-egg issues with IPFW and PF packet filter ordering.
      Since neither supports reinjection of the packet (that's a feature
      that PF sorely needs).  IPFW comes first, then PF.

      Basically the default route can be changed with IPFW prior to a
      packet feeding into PF, but for some reason either cannot be changed
      safely within PF or the change occurs in the wrong order relative
      to other rules.

    * Discovered a bug when combining a NAT rule with a change in the
      default route.  DragonFly wanted to send an ICMP IP Redirect for
      the pre-NAT packet in addition to translating the packet and
      transmitting a post-NAT packet.  That created a bit of a mess.

      (fortunately it can be turned off with a sysctl, but it's still
      a bug).

    So, three not so wonderful glitches which hopefully we'll be able to work
    on this year.  We need better handling of multiple default routes,
    we need to track down and fix the improper ICMP IP Redirect, and we
    need to fix the if_bridge's STP protocol in multi-homed configurations.

    --

    There may be some more down-time this afternoon as I clean up the
    DNS infrastructure.

					-Matt
					Matthew Dillon 
					<dillon at backplane.com>



More information about the Users mailing list