Breaking the 750 MB/s barrier
In parts 1 through 5 of this series, we discussed 10GbE networking basics, built up a software toolkit, reviewed 10GbE NAS performance, discussed why you should consider a NAS, and took a close look at SMB3 networking advancements. In this penultimate installment, we’ll look over a plug and play 10GbE NAS solution and build up an Adobe CC editing workstation capable of exceeding 750 MB/s over 10GbE networks.
Before we get started, let’s look at how all of this can be integrated into your existing network. There have been several questions related to MacOS workstations and possible network configurations posed in the comments. Here are two possible network configurations. The first is very similar to what we are using at Cinevate.
The direct connect setup below would not require purchasing a 10GbE switch in cases where only two 10GbE connected workstations were required, and your NAS or server had a dual port 10GbE NIC installed.
Buy a NAS, or build a server?
If you do decide to purchase a 10GbE NAS to share your video projects, you’ll still need a fast 10GbE equipped editing workstation. But you certainly won’t need to build your own server. In Part 3 of this series, I reviewed QNAP’s TS-470 PRO, but recommended an 8-bay (or more) NAS to fully take advantage of 10GbE network speeds.
For those with an existing equipment rack system, I’d suggest starting with the QNAP TS-EC879U-RP or TS-879U-RP, which have room for 2 optional PCIe cards. One card can be added for 10GbE (2 ports) and optionally, another to connect additional enclosures like the REXP-1200U-P. These drive enclosures allow you add much more storage without adding another NAS.
I had a chance to test out QNAP’s TS-870 Pro NAS fully populated with eight Hitachi 4 TB drives, which returned some very impressive performance numbers. This NAS performs extremely well and at a price that should make you stop and question the value in building your own server. If you are looking for warranty and support, QNAP has proven to be excellent in this department. Regardless of what brand you choose, make sure the NAS hardware is sufficient for your bandwidth needs.
On the back view of the TS-870 Pro, you can see I’ve connected both 10GbE and 1GbE interfaces as per the wiring diagram above, integrating the NETGEAR 8 port 10GbE switch as well as our older 1GbE equipment. You can connect both ports to your switches and configure them for redundancy, or link aggregation in the NAS network configuration.
Multiple NAS network connections will not magically increase network performance for a single connection to the NAS, but can increase overall bandwidth to clients as network loads increase. This behavior potentially changes if the NAS is running Windows 2012 Storage Server, which can aggregate network ports transparently to increase single connection speeds.
Part 5 (SMB3) of the series covers this behavior in some detail. For a NAS running Linux based software (almost all of them), a single SMB3 connection to Windows 8.1 was limited in my testing to ~ 740 MB/s. You would likely see a lower number with MacOS / SMB2, or Windows 7 or earlier.
In the following tests, the TS-870 Pro NAS shows impressive ATTO results with 8 drives configured as a RAID 5 array and ATTO queue depth set to 4.
A typical Windows 8.1 large file copy/paste shows writes to the TS-870 PRO at ~ 550 MB/s and reads at ~ 740 MB/s fully loaded with hard drives and configured in RAID 5. Using Intel’s NASPT tool, the results of “real world” application traces are shown.
In the case you are working exclusively on the MacOS platform and looking for a plug and play solution, a 10GbE NAS along with workstation 10GbE network cards from companies like Small Tree (who supports the Intel X540 cards) is all that would be required. If you’re looking to put your own Windows based 10GbE server or workstation together, read on.
What software should I use if I’m building my own workstation/server?
If you’ve read my previous 10GbE posts, you’ll know that for Adobe CC video/photo editing, 10GbE performance is best on either Windows 8.1, or Windows 2012 Server due to SMB3 multichannel features. In many tests between workstations, Windows 8.1 performed just as well as Server 2012 as far as large file network file transfers were concerned. Windows SMB3 multichannel will attempt to establish up to four threads per network interface (NIC) per session by default.
We also learned in previous posts that RSS (found in advanced NIC configuration) will try to spread this workload over available CPU cores. This is likely why a processor like the i7 4770K with four physical cores, showed better performance than the i5 processors (2 physical cores) I tested in 10GbE network tests.
A Windows 8.1 based NAS, with an i7 processor and fast RAM will perform very well as a 10GbE server, or workstation. Just remember that Windows 8.1 is limited to 20 SMB inbound connections if you plan on using it as a “server”.
Here are some pretty impressive numbers generated between two Windows 8.1 workstations, both sharing RAM disks over 10GbE connections. RAM disk software creates a virtual hard disk in workstation RAM, removing the testing bottleneck created by local SSD or hard disks.
PCIe SSD drives are the only single drives right now of sustaining these speeds, some in excess of 2000MB/s.
So how did we manage server builds to allow 900MB/s data transfer? After many builds, OS tweaks and tests, here are my official workstation and server build recommendations for the tech savvy DIY builder looking to edit video via a shared storage 10GbE solution.
The parts list for the workstation build is shown in Table 1. Note this does not include a Windows 8.1 license. This configuration could also be used as a less powerful server for small workgroups.
Table 1: 10GbE Workstation
|Case||Rosewill BlackHawk (an older Antec case is pictured below)||$79|
|Motherboard||Asus Z87-A or Z87-WS||$149/$349|
|CPU||Intel Core i7 4770K @ 3.4GHz||$359|
|CPU Cooler||Noctua NH-U12S||$77|
|Memory||GSKILL RipJaws X Series (4 x 8GB -> 32GB)||$520|
|Video Card||Nvidia GTX 760 or GTX 780||$250/$520|
|SSD||500 GB Samsung Evo||$330|
|10GbE NIC||Intel X540-T1||$388|
|Power Supply||Corsair TX750 (750 W)||$89|
Being a fan of Asus for several decades now, there are several motherboards I would suggest looking at as the foundation for a performance Adobe CC editing computer. The Asus Z87-A ($150) has sufficient PCIe slots for basic applications. But a previous-generation Z77-V board performed quite well during 10GbE testing both as a server and workstation board.
You can see the Z77-V here configured with 32 GB of RAM, a RocketRaid RAID 2720 SGL card (supports 8 hard drives) and the Intel X540-T1 10GbE NIC.
My own editing workstation shown below is based on the Asus Z87-WS motherboard, using two Nvidia GTX 650 Ti Boost cards. They are connected with an SLI cable (just in case you’re a gamer too). However, when using Adobe CC, you need to disable SLI using Nvidia’s driver interface. Adobe Premiere CC will find and use all the CUDA cores it can for rendering etc. Combining two cards, Adobe Premiere CC will see 1536 CUDA cores and 4GB video memory.
You can see the Intel X540 NIC nestled between the video cards, nicely cooled by the Nvidia video card’s cooling fan right beside it. If you compare this case arrangement with the Corsair 550D used later in our server build, you can see a large improvement in terms of cable management, and therefore airflow using a newer case like the 550D.
The Noctua NH-U12S CPU cooler is one of the most efficient units out there, and supported overclocking in this system to 4.3 GHz in Turbo mode with no issues. The i7 4770K processor presents 8 cores to the operating system, so even a small overclock can have significant impact on video rendering times. As a video editing machine may operate for hours during a render at 100% CPU loading, an efficient CPU cooler is important.
You can see I’ve spec’d RAM at 32 GB. This is more an advantage if you’re using Adobe CC After Effects, where RAM can be assigned to CPU cores in order to speed up rendering times. If you’re not doing much in After Effects, 8 or 16 GB of RAM might serve your needs just fine. You would see very little difference in the Adobe Premiere CC Benchmark tests.
First, here’s the component diagram for the ASUS Z87-WS motherboard I use in my own editing workstation. It has two 1 GbE network ports and sufficient PCIe bandwidth for two SLI-capable video cards, while running a single-port 10GbE network card.
At the upper end of the workstation motherboard range, the ASUS P9X79-WS workstation uses a socket 2011 configuration to provide both more lanes of PCIe bandwidth and significantly more memory capacity. If you are looking to build a serious editing workstation/server with two video cards, a RAID card, and dual port 10GbE card, all running simultaneously, this board would be required to ensure sufficient PCIe bandwidth.
The Asus Z87-WS workstation I used for Adobe Premiere CC (current as of January 2014) network tests generated some very respectable numbers using the Adobe Premiere Pro Benchmark for CS5 (PPBM5) and Premiere Pro Benchmark for CS6 (PPBM6). Bill Gehrke and Harm Millaard maintain these benchmarks in an attempt to assist editors in building cost-effective editing workstations. You can download the older test series at http://ppbm5.com/Instructions.html or the newer (takes longer to run) version at http://ppbm7.com/index.php/homepage/instructions
This benchmark numbers for the older PPBM5.5 tests are generated during the render and output of three timelines, pictured below:
I’ve updated the PPBM5.5 tests shown earlier in this series with more tests, including top 10 average results for the highest scoring online submissions. For these tests, the project, source files, cache files and preview files were all in the same directory on the local drive, NAS, or Server to reflect a completely shared network solution. The results, using various local drives and network shares, are summarized in Table 1.
Table 1: PPBM5.5 Test Results
|Target Disk||Disk I/0 Test (seconds)||Mpeg2-DVD encode (seconds)||H.264 Encode (seconds)||MPE (mercury engine enabled, seconds)|
|Local 500GB SATA3 SSD Samsung Evo with RAPID enabled||29||41||49||4|
|TS-470 Pro 10GbE, Intel 530 SSD x 4, 1 TB, RAID 0||35||42||52||4|
|Windows 2012 Server R2, 10GbE, RocketRaid 2720, 24TB, 6 x 4TB Hitachi 7200 disks, RAID 5||39||43||49||4|
|TS-470 Pro 10GbE, 16TB, 4 x 4TB Hitachi 7200, RAID 0||54||43||51||5|
|Local WD Black 2TB 7200 HD||84||40||49||5|
|5 year old TS-509 Pro, 1GbE, 5TB, 5 x 1TB 5400, RAID5||263 (Yikes!)||80||53||7|
|Average Top 10 of 1351 results posted: http://ppbm5.com/DB-PPBM5-2.php||55||31||40||4.5|
From these results, you can see that the 2012 Windows Server with a six disk RAID 5 array (third down) performed almost as well over 10GbE as did the locally-connected Samsung Evo SSD with RAPID RAM caching enabled (first entry). In other words, the 10GbE network drive was almost as fast as the very latest SSD technology connected directly to the motherboard, but with 40X the capacity.
Expand the Server’s RAID array to eight or more drives and the 10GbE results could easily surpass a locally-connected SSD, which is limited by SATA3 to around 500 MB/s.
The importance of these benchmarks is that they use real video files in various time lines to offer real-world comparative benchmarks. The 1 GbE results (sixth down) should reveal why editing over older 1 GbE links is not recommended. If you are doing a lot of rendered finished output and time is critical, a more powerful Nvidia like the GTX 780 would reduce all of the rendering times I’ve posted significantly.
The parts list for the server build is shown in Table 2. Note this does not include Windows 2012 Server Essentials or Windows 8.1 x64 license.
Table 2: 10GbE Workstation parts
|Case||Corsair Obsidian 550D||$150|
|Motherboard||Supermicro X9SRH-7FT ATX
(10GbE Intel X540 dual port onboard)
|CPU||Intel E5-2620 v2 @ 2.10GHz – 6 Core LGA2011 15MB Cache||$500|
|CPU Cooler||HH-U12DX i4 CPU cooler||$80|
|Memory||8GB Module DDR3 1600MHz ECC CL11 RDIMM Server Premier (x4)||$290|
|RAID Controller||RocketRaid 2720 SGL||$154|
|RAID Cables||CABLE 3WARE – CBL-SFF8087OCF-05M (x6)||$24|
|Boot SSD||Intel 530 120 GB||$99|
|Hard Drives||24 TB RAID 5 Array – Hitachi Deskstar NAS 4 TB (x6)||$1400|
|Power Supply||Corsair 750w Bronze (750 W)||$89|
If you’re just sharing 10GbE storage for two workstations, you could simply purchase a dual-port 10GbE card for your server and directly connect the workstations. The rest of your network could access your server via existing 1 GbE infrastructure, connected to an inexpensive 1 GbE NIC (I’ve added an older dual port 1 GbE Intel card in the Cinevate server build). The Supermicro X9SRH-7TF board can handle two PCIe x 8 cards. So add another two dual-port 10GbE cards and up to six workstations could be directly-connected via 10GbE.
Alternatively, I produced some very high performance numbers substituting the cheaper ASUS Z87-A board in the “server build” using just Windows 8.1 as the operating system. A small shop with only two editing workstations might just build an Adobe CC workstation/server in the Obsidian 550D case, add six to eight hard drives, a single port 10GbE card and share the RAID array with the second workstation by directly connecting the 10GbE cards. In this case, one workstation would double as the server.
Consider that a NAS like the QNAP TS-870 Pro with the same six 4 TB drives (with room for 2 more) and 10GbE interface would total approximately $3500, with no server license costs. You can see there is a business case to just use a NAS, if all you need is file storage or if you already have a server on site.
The SuperMicro board I used is somewhat unique in the market right now for several reasons:
1. Supermicro has integrated dual Intel X540 10GbE ports. The price of the entire board is less than a 2 port Intel 10GbE card!
2. It is an ATX format server board, so it is easy to find cases to fit.
3. The X9SRH-7TF is a single CPU board, supporting 22nm Xeon processors, so power consumption is much less than a dual CPU server board.
4. A third network port provides out-of-band management (IPMI) meaning full access to the server (even during startup), and the ability to set thresholds and receive emails if, say, a chassis fan fails. This is all done using a remote Java -nabled web browser. Very cool.
5. It supports a lot of RAM, so a MacOS-only shop needing an Adobe CC workstation could potentially run it via a virtual machine on the server, accessible to any network Mac. Virtual machines are used increasingly to run multiple servers and operating systems simultaneously, using just one box.
6. The board hosts 10 x SATA3 ports, as well as 4 x SATA2 ports onboard. For those looking at an Ubuntu server build, i.e. free server software and a ZFS RAID array, this Supermicro board is ideal.
The Supermicro board has video and 10GbE on board. So all you need to get started is RAM, CPU, CPU cooler and boot drive. Note that this board requires a narrow ILM bolt pattern heat sink, which differs from the typical square pattern used for Xeon processors. The narrow version requires less space on the motherboard. Noctua’s NH-U9DX i4 cooler is excellent and includes all the parts needed for various configurations.
You’ll see in the gallery pictures that these Xeon heatsinks attach using screws, instead of the typical spring-loaded plastic push pins. The Xeon 22nm CPUs differ from typical i5 or i7 processors in that they use the socket 2011 standard, so more “lanes” of bandwidth to the chip are provided. This means more PCIe x16 and/or x8 slots, as well as higher RAM capacity. Xeon CPUs don’t include Intel HD graphics, so tend to be bit less expensive as well.
Click through the gallery for a look at all the parts and the finished assembly.
Here’s a screen grab of the IPMI interface, accessed using a web browser via a dedicated 1 GbE port on the Supermicro board.
What kind of performance will this server provide? Here’s the ATTO Disk Benchmark. See the notes in the screenshot for test details.
And here are the Intel NASPT results. Again, test details are in the screenshot comment box.
Here’s a Windows 8.1 copy and paste over 10GbE from Raid 5 array (6 x 4TB Hitachi 7200 rpm)
So there you have it. I hope you’ve found this series as much fun to read as I had researching, testing and writing it! Cinevate’s 10GbE transition is well under way, as the 2012 Server and two 10GbE NASes replace their older counterparts. Our next product launch coming later this month (April 2014) will take advantage of the new high-speed collaborative workflow.
Dennis Wood is Cinevate’s CEO, CTO, as well as Chief Cook and Bottle Washer. When not designing products, he’s likely napping quietly in the LAN closet.