Confessions Of A 10 GbE Network Newbie – Part 5: SMB3

Confessions Of A 10 GbE Network Newbie – Part 5: SMB3 - See more at: http://www.cinevate.com/blog/?p=7380&preview=true#sthash.Kx7pKnoe.dpuf

Last time, I wrote about how I believe every small to medium sized business should consider a NAS. In this installment, I’ll be focusing on SMB3.

SMB3

SMB (server message block) is the core of Microsoft’s and, more recently, Apple’s networking. Qnap, like many NAS manufacturers, uses open source SAMBA code to provide SMB services on its NASes.

It is important to know that SAMBA’s implementation of SMB3 does not have all the bells and whistles found in Windows 8.1 and MS Server 2012′s SMB3 implementation. The single most significant difference is that Microsoft’s SMB3 provides for multi-channel communication. I”ll discuss this difference in more detail later in this series. For now, there are two key implications.

First, performance increases are significant as you compare SMB1, 2 and SMB3. In my tests, this difference is quite obvious using 10GbE. The implication is that Windows 8.1 or Server 2012 clients will see faster transfer speeds from NASes like QNAP’s TS-870 Pro and TS-470 Pro running QTS 4.1 (which supports SAMBA’s SMB3 implementation) than previous versions of Windows. For now, MacOS Mavericks appears to be using SMB2, so will not benefit from SMB3 enhancements.

Second, Microsoft’s SMB3 multichannel is perhaps the most significant feature added to the SMB protocol in over 25 years. In simple terms, an older SMB2 connection in Windows 7 can be compared to a single lane highway and data traveling on this highway loaded into a 1995 half ton pick-up truck. A video file uploaded over 10GbE to a server is limited by this single lane highway and the older truck.

SAMBA’s SMB3 gives us the 2013 turbo-charged truck that can travel faster, carry a larger load and use less fuel doing it. Microsoft’s SMB3 multichannel takes this up a notch by adding multiple lanes, so now we can have more of our turbo charged trucks (threads) on the highway simultaneously. Our video file upload carried by these trucks moves much quicker as a result.

The number of channels increases with the computer’s number of CPU cores and network connections. If we upgrade from a four-core i5 processor to an i7 with 8 logical cores, we effectively increase from 4 to 8 lanes on our single 10GbE Ethernet link. If we add a second network link, Microsoft’s SMB3 implementation detects this and automatically adds more lanes. You can read more about SMB3 multichannel here.

In an impressive example of Microsoft’s SMB3 multichannel, the screen grab below shows 1480 MB/s in a real world copy/paste between two Windows 8.1 workstations, each sharing an 8 GB RAM drive. How was this possible? In a hands-down easiest network configuration ever, I simply connected the workstations with two 10GbE Ethernet cables using Intel X-540-T2 dual-port 10GbE network interface cards. Zero configuration was required on either the workstations or 10GbE switch.

1480 MBs transfer

Wowza! 1480 MB/s transfer

For anyone who has configured NIC teaming or LACP link aggregation, the result above should amaze you. For really inexpensive high speed networking, two eBay quad port 1GbE cards plus four CAT5e LAN cables and you’re at 480MB/s with zero configuration! In my 20 plus years of networking, I’ve never seen something just work like this.

Setting up Windows SMB3 10GbE

First of all, while testing, uninstall antivirus software. Don’t just disable it, uninstall it. Add it back in only once you are done with all other testing. Bitdefender, for example, in default configuration will dramatically slow down 10GbE connections. Microsoft Security Essentials seems to have little effect, but it’s not my recommended AV recommendation either.

Jumbo frames and driver tuning should be on your list of things to do for best 10GbE performance. I typically never touched these settings in 1GbE environments, however the images below make it very clear that jumbo frames and driver tuning are well worth the effort to configure. In test one I’ve used default network driver settings and standard frames.

Performance with no tweaks

Performance with no tweaks

The second result shows clearly the performance increase to be had by tweaking network driver settings, and enabling jumbo frames.

Performance with all tweaks

Performance with all tweaks

As far as driver tweaks on Intel 10GbE NICs in the Windows environment, you may want to look at RSS queues, which according to Intel, should be set to match your computer’s logical core count. This may make a difference in SMB3 multichannel performance in Windows 8 and Server 2012. SMB3 has default values that can be tuned using Powershell commands. By default each NIC will get four TCP/IP sessions, up to a maximum of 8 channels per client/server connection. RSS tuning in turn will affect how this workload is distributed over the available CPU cores. I would suggest that you set up a few ram drive shares and experiment with file copies in your environment to see what works best. If you are just setting up a 10GbE NAS in a MAC environment, start with jumbo frames.

I also increased the transmit and receive buffers to their maximums, and enabled most performance options. These tweaks generally will use more RAM and CPU resources, but increase 10GbE transfer speeds. On a heavily loaded server, you may want to dial your settings back.

10GbE Tweaking Details

The following tweaks can increase 10GbE throughput by up to 200 MB/s. First the adjustments on Intel’s X540-T1 NIC. If you have a -T2 NIC and are using both channels, be sure to make the changes on both sets of properties.

First, set jumbo packets (frames) to the maximum 9014 Bytes.

Intel X540 NIC tweak - jumbo frame

Intel X540 NIC tweak – jumbo frame

Then set Receive Side Scaling (RSS) queues to match the CPU logical core count. On an i7 based computer with hyper-threading enabled (you may have to turn this on via computer’s BIOS), you should see 8 cores.

Intel X540 NIC tweak - RSS queues

Intel X540 NIC tweak – RSS queues

And make sure RSS is enabled.

Intel X540 NIC tweak - RSS enable

Intel X540 NIC tweak – RSS enable

Next, disable Virtual Machine Queues if you see this option (Server 2012 only).

Intel X540 NIC tweak - Virtualization

Intel X540 NIC tweak – Virtualization

Then increase receive buffers to the maximum (4096).

Intel X540 NIC tweak - RCV buffers

Intel X540 NIC tweak – RCV buffers

And also increase transmit buffers. Their maximum is 16384.

Intel X540 NIC tweak - XMIT buffers

Intel X540 NIC tweak – XMIT buffers

Once you have made these changes, you should see some impressive speeds. Remember to create and share RAM disks on both your target and host machines while testing, to eliminate disk speed limitations.

Conclusion

As a teaser, your 10GbE workstation will also work quite nicely as a 4K player just in case you have a 4K display handy. For many of you, just purchasing a 10GbE enabled NAS unit is all you’ll need for a server. Just remember to enable jumbo frames and check that SMB3 is enabled! On QNAP’s lastest firmware, you’ll find those settings here:

QNAP Jumbo Frame setting

QNAP Jumbo Frame setting

And here:

QNAP SMB3 enable

QNAP SMB3 enable

In the last installment of this series, I’ll show you how to build a Windows 8.1 video editing workstation and Windows 2012 shared storage server affordably capable of maxing out a 10GbE connection.


Dennis WoodDennis Wood is Cinevate’s CEO, CTO, as well as Chief Cook and Bottle Washer. When not designing products, he’s likely napping quietly in the LAN closet.

This entry was posted in Cinevate News, President's Blog and tagged , , , , , , , , , . Bookmark the permalink.

6 Responses to Confessions Of A 10 GbE Network Newbie – Part 5: SMB3

  1. cinema-living says:

    Hi Dennis,

    This series has been so helpful. Do you have any suggestions on how to maybe implement this on a 100% mac network?

  2. Dennis says:

    The QNAP NAS units support all Apple file/network protocols, so there is really nothing different in that environment…other than a few checkboxes on the NAS to turn on AFP, Bonjour etc. The guys at small-tree.com ( http://www.small-tree.com/10GbE_Cards_s/4.htm ) sell 10GbE Intel cards like the one’s we’re using, and provide drivers etc. You could also set up a fancy pants Windows 2012 server, but MAC’s won’t really take advantage of them. In other words, a 10GbE enabled NAS, and a 10GbE card for your MAC, and you’re set. If you have a local RAID drive array on your MAC already, you could just direct connect a few workstations by purchasing 10GbE cards for them..no 10GbE switch required if it’s just two workstations. This would be a good way to say share an external Thunderbolt drive array between two or more workstations :-)

    Cheers,
    Dennis.

    • cinema-living says:

      Awesome, thanks.

      As well, just to confirm, when all of this is set up, there’s in effect two networks. One for the internet and one for the NAS. Is it feasible to run them both on the same network?

  3. philipp says:

    Hi Dennis,

    This is SUCH a great article, thank you! We currently have an iMac setup with a 10Gb/s small tree thunderbolt card and two Pegasus Thunderbolt RAIDS. The 10Gb/s Ethernet is connected to the same Dlink switch you mention in one of your earlier articles. The single 10gb/s thunderbolt chip of the iMac is a bottleneck in this setup, though, since it has to serve both the RAID and the Ethernet connection. We are also experiencing random server slowdowns that can only be solved by restarting the iMac. It’s just not very reliable.

    So I’m looking into replacing the iMac Server + Pegasus RAID + 10Gb/s Ethernet card with a QNAP 10Gb/s NAS. I would like to go with one of their rack mounted solutions, however. Do you have any recommendations? They seem to have a lot of different options and it’s kind of hard to tell the differences. I would like to be able to serve 3-5 workstations accessing 5k ProResHQ at the same time while allowing one workstation to access 5k RED raw.

    Last but not least I’d love to second cinema-living’s question. Specifically how to fine tune the parameters such as buffer settings and such in an OSX setup. Enabling jumbo frames in particular can be a bit challenging with 10gb/s in OSX.

    Would love to hear your insights.

    Thanks,

    Philipp

  4. Dennis says:

    Thanks for the kind words gents. I’m pretty convinced, based on my travels and customer chats, that many photographers/videographers are really searching for a cost effective shared storage solution, just as we are. As far as the MAC network setup goes, there are a few options. You can just assign IP addresses where direct connections apply, so in effect, you are running two networks. Because QNAP has 2 x 1GbE ports and 2 x 10GbE ports on all of their 10GbE setups, you can connect 1 or both of the 1GbE ports to an older switch (trunked together if your switch supports this), and also connect the 10GbE ports to a new 10GbE switch. These can all be on the same network, same subnet, with 1GbE and 10GbE switches connected with one or more CAT5 network cables. That’s how we’re running the TS-870 right now :-) I’ll do a simple diagram in my next blog post for how this is set up.

    For a switch-less 10GbE setup for two workstations, one workstation would connect to the NAS on the 10GbE port 1, the other to 10GbE port 2. These same workstations would likely have their 1GbE ports connected just as they are now for web access etc. As long as the IP addresses do not conflict with your existing network, you would be fine.

    As far as jumbo frames etc. on the MAC side, unfortunately I’m not of much use. The folks over at Small-Tree would be the first place I would look as I see Intel even refers customers to Small Tree from their web site. It looks like ATTO and Sonnet have also introduced 10GbE adapters for MAC. If you can’t tune the driver settings, don’t sweat it. They are basically focused on optimizing single connections in an SMB3 windows environment..so in the MAC environment, the tuning suggestions would definitely bear some testing.

    For rack mount, I’d suggest starting with the TS-EC879U-RP or TS-879U-RP which have room for 2 optional PCIe cards. 1 card can be added for 10GbE (2 ports), and optionally another for SAS (like the REXP-1200U-RP ), just in case you need more storage. Having seen these questions, I will attempt a simple diagram to illustrate the networking stuff :-)

    Cheers,
    Dennis.

  5. Pingback: Confessions Of A 10 GbE Newbie – Part 6: Breaking the 10GB Data Barrier | Cinevate – Tools for Filmmakers and Photographers

Leave a Reply