Hardware – Part II (Compute)

Compute hardware

Faithlife compute has gone through quite a few iterations in recent years. The transformation has been a critical piece of our success and ability to scale at costs that make sense to the business. Each iteration moves our deployment closer to being aligned with our overall philosophy.

Humble beginnings

Our first attempt at being more nimble, and reducing costs over our aging and expensive IBM physical server deployments, was VMware vCenter on an IBM Bladecenter backed by an IBM DS3500 SAN. Yes, you read that correctly, and yes, we may not have thought that decision through entirely.

Screen Shot 2015-03-25 at 3.04.02 PM

Plenty of flexibility was gained by virtualization, but the cost of the Bladecenter, Blades, SAN, and VMware licensing meant that even the smallest incremental addition to the infrastructure represented a dollar amount that needed lots of discussion before approval. These factors lead to projects being put on hold, developers not having the resources they needed, and Operations constantly battling an infrastructure running at or above capacity.

Commodity hardware, take one

Realizing that we were hamstrung by expensive hardware and licensing, we took to the basement and started a skunkworks project.

After a couple of weeks, and one hundred dollars, we emerged victoriously from the basement. We assembled twelve Dell Optiplex 960 workstations as OpenStack compute nodes, three APC “PDUs”, six Cisco desktop switches, and some really awesome 1Gb Ethernet, all on a Costco rack. Believe it or not, we actually replaced a few of our aging development servers with this for quite a few months. Though, I think we took commodity hardware a bit too seriously, and our datacenter wouldn’t allow us to deploy it in our cage.

2013-04-04 15.59.23

Commodity hardware, take two

Having prototyped OpenStack, and shown that it had the potential to both run on commodity hardware and replace our current virtualization stack, we moved forward with a small production deployment to help deal with some of our capacity issues.

Our initial production OpenStack deployment consisted of three controllers, three compute nodes, and eight Ceph nodes. This was also the beginning of our servers becoming multi-purpose Lego Bricks. We used Dell R610 1U servers in one of three different configurations for all things. Additionally, we started keeping some spare memory, disk, CPU, and R610 chassis on hand. Since we had spare parts and a single server kind, we could easily fix or replace any piece of our hardware infrastructure.

2014-01-13 17.06.55

The relatively low cost of the Dell R610 1U servers combined with free and open source virtualization meant we could finally remove the dam that was holding back additional gear. It took less than four months to go from the initial nine servers to one and a half racks full of gear.

During the build out we realized that our initial SAS based Ceph nodes did not have sufficient performance for database volumes and were too expensive for general purpose OS volumes. The solution was to add two new types of servers: Dell R620 filled with SSD, and Dell R510 filled with SATA.

2014-08-21 11.58.26

When OpenStack and Ceph went in to a death spiral and we transitioned to Joyent’s SmartDataCenter, we were able to reuse this same hardware for the emergency deployment with minor configuration changes and on hand parts (just one more reason Lego Bricks for hardware is so important).

Commodity hardware, take three (Joyent SmartDataCenter / current day)

Shortly before we transitioned to Joyent SmartDataCenter, we acquired space in a brand new datacenter. This gave us a nice green field to apply the last few years’ worth of hard earned lessons and also build specifically for SmartDataCenter. Lucky for us the great people at Joyent open sourced their build of materials, which gave us a higher degree of confidence that our new build would be successful (after all Joyent has already proven these builds in private and public clouds).

We really liked the Tenderloin-A/256 build based on price, disk performance, and density. Unfortunately the Tenderloin-A/256 build is based on SuperMicro parts, and we’re more comfortable with Dell servers; we have a great relationship with Redapt, a Dell partner who we purchase most of our hardware through. In that light, we worked with Redapt and Joyent to create a Dell build that is very close to the Tenderloin-A/256 Joyent build.

2015-01-07 21.22.29

Faithlife’s SmartDataCenter compute node build of materials

  • 1 x Dell R720 Chassis
  • 2 x Intel Xeon E-2650v2
  • 1 x iDRAC7 Enterprise
  • 1 x Intel X520 DP 10Gb DA/SFP+, + L350 DP 1Gb Ethernet Daughter Card
  • 1 x Intel / Dell SR SFP+ Optical Transceiver
  • 16 x Dell 16GB RDIMM 1866MT/s (256GB total)
  • 2 x 750W Power Supply
  • 1 x 200GB Intel DC S3700 SSD
  • 1 x Kingston 16GB USB stick
  • 1 x SuperMicro AOC-S2308L-L8E SAS controller
  • 15 x C10K900 HGST 2.5” 10K 600GB SAS

We’ve been running SmartDataCenter on this build with hundreds of VMs for a while now. The performance is outstanding; in fact, some of our VMs that previously needed dedicated SSD are just as happy on this SAS based configuration thanks to SmartOS zones and ZFS.

 

Speak Your Mind