https://bugs.dragonflybsd.org/https://bugs.dragonflybsd.org/favicon.ico?16293952082006-11-01T02:14:02ZDragonFlyBSD bugtrackerDragonFlyBSD - Bug #368: 4) updatehttps://bugs.dragonflybsd.org/issues/368?journal_id=15292006-11-01T02:14:02Zdacut
<ul></ul><p>Ironically, at [big e-commerce company], where I work, we're moving away <br />from the dedicated load balancing hardware, though not for any of the <br />reasons you're discussing. Our traffic volume is high and heterogeneous <br />enough to trigger subtle bugs in the LB firmware. Getting debug <br />information is next to impossible when they just start dropping packets <br />on the floor. They're also really expensive, so the vendors' solution <br />-- "You need to buy more of our hardware" -- makes our finance guys cringe.</p>
<p>Rather than replace them with dedicated boxes, though, we'll probably <br />just remove that layer entirely -- clients will negotiate leases onto a <br />single host for a given period of time, with initial discovery through a <br />broadcast mechanism.</p> DragonFlyBSD - Bug #368: 4) updatehttps://bugs.dragonflybsd.org/issues/368?journal_id=15322006-11-01T02:14:02Zdillon
<ul></ul><p>:Matthew Dillon wrote:<br />:> [If] I actually HAD a critical application that required that kind of<br />:> routing or bridging I would buy a dedicated piece of hardware to handle<br />:> it, not try to use a general purpose operating system running on <br />:> commodity hardware.<br />:<br />:Ironically, at [big e-commerce company], where I work, we're moving away <br />:from the dedicated load balancing hardware, though not for any of the <br />:reasons you're discussing. Our traffic volume is high and heterogeneous <br />:enough to trigger subtle bugs in the LB firmware. Getting debug <br />:information is next to impossible when they just start dropping packets <br />:on the floor. They're also really expensive, so the vendors' solution <br />:-- "You need to buy more of our hardware" -- makes our finance guys cringe.<br />:<br />:Rather than replace them with dedicated boxes, though, we'll probably <br />:just remove that layer entirely -- clients will negotiate leases onto a <br />:single host for a given period of time, with initial discovery through a <br />:broadcast mechanism.</p>
<pre><code>I completely agree with that sentiment, especially for load balancing<br /> hardware. I personally believe that it is actually better to have<br /> a real machine on the front-end accepting connections and manipulating<br /> the data, then reconnecting to the 'real' machine on the backend and<br /> doing <--> <--> with the rest of the data stream.</code></pre>
<pre><code>I implemented such a solution at BEST Internet to handle<br /> <a class="external" href="http://www.best.com/~user">www.best.com/~user</a> web accesses. As people may know, we had something<br /> like 25+ separate user machines, with about 2000 accounts on each.<br /> But we wanted to have a common URL to frontend all of personal web <br /> pages. I don't quite remember but I think we also used the same<br /> scheme to shift dedicated WWW domains arounds.</code></pre>
<pre><code>The solution was to have a couple of cookie-cutter boxes front-ending<br /> all the WWW connections, doing the first few protocol interactions<br /> (e.g. processing the WWW command and the Host: header), then<br /> looking the info up in a DBM and reconnecting to the actual machine.</code></pre>
<pre><code>This scheme had many, many advantages.</code></pre>
<ul>
<li>As almost pure networking applications (with only a couple of DBM<br /> lookups occuring, all easily cacheable in memory)... these machines<br /> never crashed. And I mean never. We found a networking card that<br /> worked perfectly (fxp I think) and never had a single problem.</li>
</ul>
<ul>
<li>It didn't matter which machine handled a connection. Each machine<br /> was an exact cookie cutter copy. A DBM update was pushed out to <br /> the boxes once an hour via cron.</li>
</ul>
<ul>
<li>I could add or remove machines at will. Bandwidth and loading was<br /> never an issue. We never needed more then 3 boxes, though.</li>
</ul>
<ul>
<li>We could dip into the data stream at will to diagnose problems.<br /> All the data was running through userland. It wasn't using bridging<br /> or package manipulation. I also had the programs keep statistics.</li>
</ul>
<ul>
<li>The machines could serve as buffers against load spikes and since<br /> they processed the first few protocol commands and headers they also<br /> had a tendancy to be able to intercept DOS attacks made against our<br /> web servers.</li>
</ul>
<pre><code>And so on and so forth. I did a similar protocol intercept for POP3<br /> and used MX records and MX forwarding to buffer SMTP (mainly to<br /> offload the crazy DNS load so the shell boxes wouldn't have to cache<br /> tens of thousands of domain names in their local DNS servers).</code></pre>
<pre><code>What's really interesting is that these forwarding boxes added something<br /> like 5ms of delay to the data streams, and nobody ever noticed or cared.</code></pre>
<pre><code>These days accessing any major web site can take multiples of seconds<br /> due to the complexity of the site, all the separate DNS domains that<br /> the client machine has to lookup to process the page, and backend <br /> latency (servlet startup, etc). If I bring up any major web site,<br /> like <a class="external" href="http://www.sfgate.com">www.sfgate.com</a> or slashdot or the JPL or ANY major news site<br /> it takes no LESS then 5 seconds for the web page to load, and sometimes<br /> upwards of 10 seconds. My bank's web site is just as bad. And it isn't<br /> because of network bandwidth issues. I think one could introduce upwards<br /> of 20ms of networking latency on the web server side and not notice<br /> any difference.</code></pre>
<pre><code>What is crazy is that nobody bothers to benchmark this problem. All the<br /> benchmarks you see published are basically measuring how many <br /> milliseconds it takes to load a simple HTML page or a few graphics, and<br /> they pat themselves on the back for being 2ms faster then the <br /> competition. Nobody gives a $#@% about 2ms, or 10ms. That isn't the<br /> problem any more.</code></pre>
<pre><code>Similarly, if running a GigE link into a server requires it to operate<br /> at 100% capacity, the problem is with the design of the service, not<br /> with the fact that the network driver eats up a lot of cpu when running<br /> at 100% capacity. If I have a big web server and I am shoving out <br /> 500 MBits of data a second, then my main worry is going to be the cost<br /> of transporting that data over the internet relative to which the cost<br /> of the server is pretty much zip. Except for very, very rare cases I<br /> am not really going to give a rats ass when a single tcp connection is<br /> unable to saturate a GiGE link.</code></pre>
<pre><code>-Matt</code></pre> DragonFlyBSD - Bug #368: 4) updatehttps://bugs.dragonflybsd.org/issues/368?journal_id=16032006-11-19T05:25:00Zjustin
<ul></ul><p>Committed in <br /><a class="external" href="http://www.dragonflybsd.org/cvsweb/src/sys/dev/netif/re/">http://www.dragonflybsd.org/cvsweb/src/sys/dev/netif/re/</a></p>