Confessions Of A 10 GbE Newbie – Part 6: Breaking the 10GB Data Barrier

Breaking the 750 MB/s barrier

In parts 1 through 5 of this series, we discussed 10GbE networking basics, built up a software toolkit, reviewed 10GbE NAS performance, discussed why you should consider a NAS, and took a close look at SMB3 networking advancements. In this penultimate installment, we’ll look over a plug and play 10GbE NAS solution and build up an Adobe CC editing workstation capable of exceeding 750 MB/s over 10GbE networks.

Before we get started, let’s look at how all of this can be integrated into your existing network. There have been several questions related to MacOS workstations and possible network configurations posed in the comments. Here are two possible network configurations. The first is very similar to what we are using at Cinevate.

Switched 10GbE network example

Switched 10GbE network example

The direct connect setup below would not require purchasing a 10GbE switch in cases where only two 10GbE connected workstations were required, and your NAS or server had a dual port 10GbE NIC installed.

Unswitched 10GbE network example

Unswitched 10GbE network example

Buy a NAS, or build a server?

If you do decide to purchase a 10GbE NAS to share your video projects, you’ll still need a fast 10GbE equipped editing workstation. But you certainly won’t need to build your own server. In Part 3 of this series, I reviewed QNAP’s TS-470 PRO, but recommended an 8-bay (or more) NAS to fully take advantage of 10GbE network speeds.

For those with an existing equipment rack system, I’d suggest starting with the QNAP TS-EC879U-RP or TS-879U-RP, which have room for 2 optional PCIe cards. One card can be added for 10GbE (2 ports) and optionally, another to connect additional enclosures like the REXP-1200U-P. These drive enclosures allow you add much more storage without adding another NAS.

I had a chance to test out QNAP’s TS-870 Pro NAS fully populated with eight Hitachi 4 TB drives, which returned some very impressive performance numbers. This NAS performs extremely well and at a price that should make you stop and question the value in building your own server. If you are looking for warranty and support, QNAP has proven to be excellent in this department. Regardless of what brand you choose, make sure the NAS hardware is sufficient for your bandwidth needs.

QNAP TS-870 front

QNAP TS-870 front

On the back view of the TS-870 Pro, you can see I’ve connected both 10GbE and 1GbE interfaces as per the wiring diagram above, integrating the NETGEAR 8 port 10GbE switch as well as our older 1GbE equipment. You can connect both ports to your switches and configure them for redundancy, or link aggregation in the NAS network configuration.

QNAP TS-870 rear

QNAP TS-870 rear

Multiple NAS network connections will not magically increase network performance for a single connection to the NAS, but can increase overall bandwidth to clients as network loads increase. This behavior potentially changes if the NAS is running Windows 2012 Storage Server, which can aggregate network ports transparently to increase single connection speeds.

Part 5 (SMB3) of the series covers this behavior in some detail. For a NAS running Linux based software (almost all of them), a single SMB3 connection to Windows 8.1 was limited in my testing to ~ 740 MB/s. You would likely see a lower number with MacOS / SMB2, or Windows 7 or earlier.

In the following tests, the TS-870 Pro NAS shows impressive ATTO results with 8 drives configured as a RAID 5 array and ATTO queue depth set to 4.

QNAP TS-870 Pro SMB3 vs. iSCSI performance

QNAP TS-870 Pro SMB3 vs. iSCSI performance

A typical Windows 8.1 large file copy/paste shows writes to the TS-870 PRO at ~ 550 MB/s and reads at ~ 740 MB/s fully loaded with hard drives and configured in RAID 5. Using Intel’s NASPT tool, the results of “real world” application traces are shown.

QNAP TS-870 Pro Intel NASPT results

QNAP TS-870 Pro Intel NASPT results

In the case you are working exclusively on the MacOS platform and looking for a plug and play solution, a 10GbE NAS along with workstation 10GbE network cards from companies like Small Tree (who supports the Intel X540 cards) is all that would be required. If you’re looking to put your own Windows based 10GbE server or workstation together, read on.

What software should I use if I’m building my own workstation/server?

If you’ve read my previous 10GbE posts, you’ll know that for Adobe CC video/photo editing, 10GbE performance is best on either Windows 8.1, or Windows 2012 Server due to SMB3 multichannel features. In many tests between workstations, Windows 8.1 performed just as well as Server 2012 as far as large file network file transfers were concerned. Windows SMB3 multichannel will attempt to establish up to four threads per network interface (NIC) per session by default.

We also learned in previous posts that RSS (found in advanced NIC configuration) will try to spread this workload over available CPU cores. This is likely why a processor like the i7 4770K with four physical cores, showed better performance than the i5 processors (2 physical cores) I tested in 10GbE network tests.

A Windows 8.1 based NAS, with an i7 processor and fast RAM will perform very well as a 10GbE server, or workstation. Just remember that Windows 8.1 is limited to 20 SMB inbound connections if you plan on using it as a “server”.

Here are some pretty impressive numbers generated between two Windows 8.1 workstations, both sharing RAM disks over 10GbE connections. RAM disk software creates a virtual hard disk in workstation RAM, removing the testing bottleneck created by local SSD or hard disks.

Two Windows 8.1 workstations - Intel NASPT

Two Windows 8.1 workstations – Intel NASPT

PCIe SSD drives are the only single drives right now of sustaining these speeds, some in excess of 2000MB/s.

Two Windows 8.1 workstations - Windows filecopy

Two Windows 8.1 workstations – Windows filecopy

So how did we manage server builds to allow 900MB/s data transfer? After many builds, OS tweaks and tests, here are my official workstation and server build recommendations for the tech savvy DIY builder looking to edit video via a shared storage 10GbE solution.

WorkStation Build

The parts list for the workstation build is shown in Table 1. Note this does not include a Windows 8.1 license. This configuration could also be used as a less powerful server for small workgroups.

Table 1: 10GbE Workstation

Component Description Price
Case Rosewill BlackHawk (an older Antec case is pictured below) $79
Motherboard Asus Z87-A or Z87-WS $149/$349
CPU Intel Core i7 4770K @ 3.4GHz $359
CPU Cooler Noctua NH-U12S $77
Memory GSKILL RipJaws X Series (4 x 8GB -> 32GB) $520
Video Card Nvidia GTX 760 or GTX 780 $250/$520
SSD 500 GB Samsung Evo $330
10GbE NIC Intel X540-T1 $388
Power Supply Corsair TX750 (750 W) $89
Total $2241-$2711

Being a fan of Asus for several decades now, there are several motherboards I would suggest looking at as the foundation for a performance Adobe CC editing computer. The Asus Z87-A ($150) has sufficient PCIe slots for basic applications. But a previous-generation Z77-V board performed quite well during 10GbE testing both as a server and workstation board.

You can see the Z77-V here configured with 32 GB of RAM, a RocketRaid RAID 2720 SGL card (supports 8 hard drives) and the Intel X540-T1 10GbE NIC.

Asus Z77-V motherboard

Asus Z77-V motherboard

My own editing workstation shown below is based on the Asus Z87-WS motherboard, using two Nvidia GTX 650 Ti Boost cards. They are connected with an SLI cable (just in case you’re a gamer too). However, when using Adobe CC, you need to disable SLI using Nvidia’s driver interface. Adobe Premiere CC will find and use all the CUDA cores it can for rendering etc. Combining two cards, Adobe Premiere CC will see 1536 CUDA cores and 4GB video memory.

You can see the Intel X540 NIC nestled between the video cards, nicely cooled by the Nvidia video card’s cooling fan right beside it. If you compare this case arrangement with the Corsair 550D used later in our server build, you can see a large improvement in terms of cable management, and therefore airflow using a newer case like the 550D.

The Noctua NH-U12S CPU cooler is one of the most efficient units out there, and supported overclocking in this system to 4.3 GHz in Turbo mode with no issues. The i7 4770K processor presents 8 cores to the operating system, so even a small overclock can have significant impact on video rendering times. As a video editing machine may operate for hours during a render at 100% CPU loading, an efficient CPU cooler is important.

You can see I’ve spec’d RAM at 32 GB. This is more an advantage if you’re using Adobe CC After Effects, where RAM can be assigned to CPU cores in order to speed up rendering times. If you’re not doing much in After Effects, 8 or 16 GB of RAM might serve your needs just fine. You would see very little difference in the Adobe Premiere CC Benchmark tests.

10GbE editing workstation, Asus Z87-WS, i7 4770K CPU, 32GB RAM, dual Nvidia GTX650 Ti Boost, X540-T1 10GbE

10GbE editing workstation, Asus Z87-WS, i7 4770K CPU, 32GB RAM, dual Nvidia GTX650 Ti Boost, X540-T1 10GbE

More Workstation

First, here’s the component diagram for the ASUS Z87-WS motherboard I use in my own editing workstation. It has two 1 GbE network ports and sufficient PCIe bandwidth for two SLI-capable video cards, while running a single-port 10GbE network card.

ASUS Z87-WS motherboard

ASUS Z87-WS motherboard

At the upper end of the workstation motherboard range, the ASUS P9X79-WS workstation uses a socket 2011 configuration to provide both more lanes of PCIe bandwidth and significantly more memory capacity. If you are looking to build a serious editing workstation/server with two video cards, a RAID card, and dual port 10GbE card, all running simultaneously, this board would be required to ensure sufficient PCIe bandwidth.

ASUS P9X79-WS motherboard

ASUS P9X79-WS motherboard

Benchmarks

The Asus Z87-WS workstation I used for Adobe Premiere CC (current as of January 2014) network tests generated some very respectable numbers using the Adobe Premiere Pro Benchmark for CS5 (PPBM5) and Premiere Pro Benchmark for CS6 (PPBM6). Bill Gehrke and Harm Millaard maintain these benchmarks in an attempt to assist editors in building cost-effective editing workstations. You can download the older test series at http://ppbm5.com/Instructions.html or the newer (takes longer to run) version at http://ppbm7.com/index.php/homepage/instructions

This benchmark numbers for the older PPBM5.5 tests are generated during the render and output of three timelines, pictured below:

PPBM5.5 benchmark

PPBM5.5 benchmark

I’ve updated the PPBM5.5 tests shown earlier in this series with more tests, including top 10 average results for the highest scoring online submissions. For these tests, the project, source files, cache files and preview files were all in the same directory on the local drive, NAS, or Server to reflect a completely shared network solution. The results, using various local drives and network shares, are summarized in Table 1.

Table 1: PPBM5.5 Test Results

Target Disk Disk I/0 Test (seconds) Mpeg2-DVD encode (seconds) H.264 Encode (seconds) MPE (mercury engine enabled, seconds)
Local 500GB SATA3 SSD Samsung Evo with RAPID enabled 29 41 49 4
TS-470 Pro 10GbE, Intel 530 SSD x 4, 1 TB, RAID 0 35 42 52 4
Windows 2012 Server R2, 10GbE, RocketRaid 2720, 24TB, 6 x 4TB Hitachi 7200 disks, RAID 5 39 43 49 4
TS-470 Pro 10GbE, 16TB, 4 x 4TB Hitachi 7200, RAID 0 54 43 51 5
Local WD Black 2TB 7200 HD 84 40 49 5
5 year old TS-509 Pro, 1GbE, 5TB, 5 x 1TB 5400, RAID5 263 (Yikes!) 80 53 7
Average Top 10 of 1351 results posted: http://ppbm5.com/DB-PPBM5-2.php 55 31 40 4.5

From these results, you can see that the 2012 Windows Server with a six disk RAID 5 array (third down) performed almost as well over 10GbE as did the locally-connected Samsung Evo SSD with RAPID RAM caching enabled (first entry). In other words, the 10GbE network drive was almost as fast as the very latest SSD technology connected directly to the motherboard, but with 40X the capacity.

Expand the Server’s RAID array to eight or more drives and the 10GbE results could easily surpass a locally-connected SSD, which is limited by SATA3 to around 500 MB/s.

The importance of these benchmarks is that they use real video files in various time lines to offer real-world comparative benchmarks. The 1 GbE results (sixth down) should reveal why editing over older 1 GbE links is not recommended. If you are doing a lot of rendered finished output and time is critical, a more powerful Nvidia like the GTX 780 would reduce all of the rendering times I’ve posted significantly.

Server Build

The parts list for the server build is shown in Table 2. Note this does not include Windows 2012 Server Essentials or Windows 8.1 x64 license.

Table 2: 10GbE Workstation parts

Component Description Price
Case Corsair Obsidian 550D $150
Motherboard Supermicro X9SRH-7FT ATX
(10GbE Intel X540 dual port onboard)
$520
CPU Intel E5-2620 v2 @ 2.10GHz – 6 Core LGA2011 15MB Cache $500
CPU Cooler HH-U12DX i4 CPU cooler $80
Memory 8GB Module DDR3 1600MHz ECC CL11 RDIMM Server Premier (x4) $290
RAID Controller RocketRaid 2720 SGL $154
RAID Cables CABLE 3WARE – CBL-SFF8087OCF-05M (x6) $24
Boot SSD Intel 530 120 GB $99
Hard Drives 24 TB RAID 5 Array – Hitachi Deskstar NAS 4 TB (x6) $1400
Power Supply Corsair 750w Bronze (750 W) $89
Total $3306

If you’re just sharing 10GbE storage for two workstations, you could simply purchase a dual-port 10GbE card for your server and directly connect the workstations. The rest of your network could access your server via existing 1 GbE infrastructure, connected to an inexpensive 1 GbE NIC (I’ve added an older dual port 1 GbE Intel card in the Cinevate server build). The Supermicro X9SRH-7TF board can handle two PCIe x 8 cards. So add another two dual-port 10GbE cards and up to six workstations could be directly-connected via 10GbE.

Alternatively, I produced some very high performance numbers substituting the cheaper ASUS Z87-A board in the “server build” using just Windows 8.1 as the operating system. A small shop with only two editing workstations might just build an Adobe CC workstation/server in the Obsidian 550D case, add six to eight hard drives, a single port 10GbE card and share the RAID array with the second workstation by directly connecting the 10GbE cards. In this case, one workstation would double as the server.

Consider that a NAS like the QNAP TS-870 Pro with the same six 4 TB drives (with room for 2 more) and 10GbE interface would total approximately $3500, with no server license costs. You can see there is a business case to just use a NAS, if all you need is file storage or if you already have a server on site.

The SuperMicro board I used is somewhat unique in the market right now for several reasons:

1. Supermicro has integrated dual Intel X540 10GbE ports. The price of the entire board is less than a 2 port Intel 10GbE card!

2. It is an ATX format server board, so it is easy to find cases to fit.

3. The X9SRH-7TF is a single CPU board, supporting 22nm Xeon processors, so power consumption is much less than a dual CPU server board.

4. A third network port provides out-of-band management (IPMI) meaning full access to the server (even during startup), and the ability to set thresholds and receive emails if, say, a chassis fan fails. This is all done using a remote Java -nabled web browser. Very cool.

5. It supports a lot of RAM, so a MacOS-only shop needing an Adobe CC workstation could potentially run it via a virtual machine on the server, accessible to any network Mac. Virtual machines are used increasingly to run multiple servers and operating systems simultaneously, using just one box.

6. The board hosts 10 x SATA3 ports, as well as 4 x SATA2 ports onboard. For those looking at an Ubuntu server build, i.e. free server software and a ZFS RAID array, this Supermicro board is ideal.

The Supermicro board has video and 10GbE on board. So all you need to get started is RAM, CPU, CPU cooler and boot drive. Note that this board requires a narrow ILM bolt pattern heat sink, which differs from the typical square pattern used for Xeon processors. The narrow version requires less space on the motherboard. Noctua’s NH-U9DX i4 cooler is excellent and includes all the parts needed for various configurations.

You’ll see in the gallery pictures that these Xeon heatsinks attach using screws, instead of the typical spring-loaded plastic push pins. The Xeon 22nm CPUs differ from typical i5 or i7 processors in that they use the socket 2011 standard, so more “lanes” of bandwidth to the chip are provided. This means more PCIe x16 and/or x8 slots, as well as higher RAM capacity. Xeon CPUs don’t include Intel HD graphics, so tend to be bit less expensive as well.

Click through the gallery for a look at all the parts and the finished assembly.

Motherboard

Group shot of the boxes for the Supermicro motherboard, Intel Xeon CPU and ECC RAM

Motherboard 2

The above parts out of their boxes. Note the stock Intel cooler is not used

Cooler

Noctua NH-U9DX i4 CPU cooler

Cooler 2

Noctua cooler shown upside down, in the process of being converted from square, to narrow ILM bolt pattern. Noctua’s highly regarded heat sink compound is included in the package, as well as instructions for application. Proper application of heat sink paste is required to ensure the CPU transfers heat to the CPU cooler efficiently.

Cooler 3

You will want to orient the CPU cooler in the best position for your setup. In this case, the warm air from the CPU cooler is directed toward the rear exhaust fan.

Assembled

A very clean build with good airflow. The Intel X540 10GbE chips are hiding under the two aluminum motherboard heat sinks visible lower right. They run quite warm, as does the RocketRaid 2720 card. Good airflow over both is a good idea.

Assembled 2

Airflow is noticeably better with the 550D case front doors open, or removed. The two front fans are hiding behind the lower removable lower panel. A removable magnetic screen filters dust and is easily removed for cleaning.

Here’s a screen grab of the IPMI interface, accessed using a web browser via a dedicated 1 GbE port on the Supermicro board.

Supermicro motherboard IPMI interface

Supermicro motherboard IPMI interface

Performance

What kind of performance will this server provide? Here’s the ATTO Disk Benchmark. See the notes in the screenshot for test details.

10GbE Server ATTO benchmark

10GbE Server ATTO benchmark

And here are the Intel NASPT results. Again, test details are in the screenshot comment box.

10GbE Server Intel NASPT benchmark

10GbE Server Intel NASPT benchmark

Here’s a Windows 8.1 copy and paste over 10GbE from Raid 5 array (6 x 4TB Hitachi 7200 rpm)

10GbE Server Windows 8.1 filecopy

10GbE Server Windows 8.1 filecopy

Closing Thoughts

So there you have it. I hope you’ve found this series as much fun to read as I had researching, testing and writing it! Cinevate’s 10GbE transition is well under way, as the 2012 Server and two 10GbE NASes replace their older counterparts. Our next product launch coming later this month (April 2014) will take advantage of the new high-speed collaborative workflow.


Dennis WoodDennis Wood is Cinevate’s CEO, CTO, as well as Chief Cook and Bottle Washer. When not designing products, he’s likely napping quietly in the LAN closet.

Like Us On Facebook

This entry was posted in Cinevate News, President's Blog and tagged , , , , , , , , , . Bookmark the permalink.

6 Responses to Confessions Of A 10 GbE Newbie – Part 6: Breaking the 10GB Data Barrier

  1. cinema-living says:

    Are you using any additional sharing software to make sure users don’t access the same files? I’ve been looking into iSANmp but would rather not add this extra cost if we don’t have to.

  2. philipp says:

    What are your thoughts on SMB3 integration into OSX 10.10 Yosemite? Any chance you’ll update this guide once 10.10 with SMB 3 is released?

  3. Marian-Mina Mihai says:

    Hi. We have a very similar setup, including Intel X540-T1 cards and a Netgear XS712T. For the moment, we have a bottleneck before getting to the NAS server, because we get a speed of 250 MB/s from Workstation1 to Workstation2 and 180 MB/s from Workstation2 to Workstation1 (which is really odd), using RAMDisk. One Workstation is a MacPro from 2010, dual Xeon, 64GB RAM (running Windows) and the other one is an Intel i7 with 32GB RAM. Both of them used as video workstations and doing a great job, so it’s not a question of Workstation performance. We did what you mentioned in the Intel cards setup, but is there anything else we should do on the Netgear side? Do you have any idea on what else could go wrong in this setup? Thank you very much!

  4. philipp says:

    I just read all of your articles again since I am in the market for a new server. Our iMac Thunderbolt server just doesn’t seem to work very well (Pegasus R6 for storage and Thunderbolt case with 10GB ethernet card connected to a Netgear 10GB switch).

    If it works it works great but every couple of days (sometimes hours) the server comes to a complete crawl and needs to be rebooted. I assume all the thunderbolt hacking my cause some sort of bottle neck but I really have no clue.

    So I am wondering if I should build the Windows 8.1 server you have outlined above and connect the two 10GB ports via link aggregation to my Netgear switch. All of our client computers are Mac’s, however. (Some running 1GB and others 10GB via thunderbolt ethernet)

    Can a Windows 8.1 server serve data to a OSX workstation? If our machines are running OS X 10.10 would we be able to take advantage of the SMB3 speeds you have outlined in this series?

    Would love to hear your thoughts on this topic.

    Thanks!

  5. philipp says:

    Also, would I be able to use a rack-mount case like this instead?
    http://www.amazon.com/Rackmount-Server-Hot-Swappable-RPC-2208-Connector/dp/B002AU4VMQ

    Or would this not be recommended with the hardware configuration above? I would prefer having the server be part of our rack mount.

  6. Dennis Wood says:

    Sorry I missed your questions folks. Here we go…

    Cinema-living, no extra software is being used. We’re a small shop, so no need for any further file locking.

    Phillip, I’ve done zero testing with OSX and SMB3, but I’m pretty sure it won’t support multichannel. This means you’re likely limited to around 750MB/s. That’s not so bad though :-)

    Marian, the Netgear switch offers little in terms of configuration, so likely your issue is either at the OS or antivirus end. You’re using Windows 8.1 correct? SMB3 multichannel will only work if Server 2012 or Windows 8 are used at all endpoints.

    Phillip, rack mount is completely ok. If you go this route, you may want to look at some of SuperMicro’s other offerings in the motherboard department, however the board I used in the appropriate rack case would work just fine. SMB3 in OSX will definitely improve speeds (but no multichannel), so you may want to set up link aggregation to the switch if you anticipate high loads. The windows server should serve up the files just fine. We are 100% a PC and Adobe CC shop, however we do have guests here from time to time who remain impressed with the LAN performance transferring media to Mac laptops. Although Windows can be a PITA, the performance to $$$ factor makes this platform my typical choice. Adobe CC seems to keep getting better, and worked great on our last project (Morpheus configuration videos) which involved a great deal of AE as well as Premiere work. Our 10G workflow has been working problem free pretty much since we commissioned the system and worked out initial setup issues. The 10G server, aside from serving files, is pretty busy running rsync backups, Server 2012 backups, Active Directory Roaming desktops/drive mapping, as well as Windows update services (SUSE) for all workstations.

    We’ve made good use of the new Virtual Machine features found on the QNAP TS-870 to host both a 2012 BDC, as well as a remote access Windows 8.1 system. This effectively turns the NAS into three machines, with considerable power savings to boot. File version (history) on the 2012 Server has been useful several times to do quick restores on deleted files, or file mods we wanted reversed. We’re also using iSCSI drive maps from the 2012 Server to the QNAP TS-870 NAS to host a very large Server 2012 backup drive for daily media backups. The combination of rsync as well as Server 2012 backup has been quite effective.

    After 8 months of 24/7 operation, the 10G equipment has been stable, and performing excellently.

Leave a Reply