The old gives way to new at Oak Ridge National Labs. The Titan supercomputer is being replaced by Frontier, and it's a super-sized task.
A supercomputer deployed in 2012 is going into retirement after seven years of hard work, but the task of decommissioning it is not trivial.
The Cray XK7 “Titan” supercomputer at the Department of Energy’s (DOE) Oak Ridge National Laboratory (ORNL) is scheduled to be decommissioned on August 1 and disassembled for recycling.
At 27 petaflops, or 27 quadrillion calculations per second, Titan was at one point the fastest supercomputer in the world at its debut in 2012 and remained in the top 10 worldwide until June 2019.
But time marches on. This beast is positively ancient by computing standards. It uses 16-core AMD Opteron CPUs and Nvidia Kepler generation processors. You can buy a gaming PC with better than that today.
“Titan has run its course,” Operations Manager Stephen McNally at ORNL said in an article published by ONRL. “The reality is, in electronic years, Titan is ancient. Think of what a cell phone was like seven years ago compared to the cell phones available today. Technology advances rapidly, including supercomputers.”
In its seven years, Titan generated than 26 billion core hours of computing time for hundreds of research teams around the world, not just the DOE. It was one of the first to use GPUs, a groundbreaking move at the time but now commonplace.
The Oak Ridge Leadership Computing Facility (OLCF) actually houses Titan in a 60,000-sq.-ft. facility, 20,000 of which is occupied by Titan, the Eos cluster that supports Titan and Atlas file system that holds 32 petabytes of data.
June 30 was the last day users could submit jobs to Titan or Eos, another supercomputer, which is also 7 years old.
Decommissioning a supercomputer is a super-sized task
Decommissioning a computer the size of Titan is more than turning off a switch. ONRL didn’t have a dollar estimate of the cost involved, but it did discuss the scale, which should give some idea of how costly this will be. The decommissioning of Titan will include about 41 people, including staff from ORNL, Cray, and external subcontractors. OLCF staff are supporting users who need to complete runs, save data, or transition their projects to other resources.
Electricians will safely shut down the 9 megawatt-capacity system, and Cray staff will disassemble and recycle Titan’s electronics and its metal components and cabinets. A separate crew will handle the cooling system. All told, 350 tons of equipment and 10,800 pounds of refrigerant are being removed from the site.
What becomes of the old gear is unclear. Even ONRL has no idea what Cray will do with it. McNally said there is no value in Titan’s parts: “It’s simply not worth the cost to a data center or university of powering and cooling even fragments of Titan. Titan’s value lies in the system as a whole.” The 20,000-sq.-ft. data center that is currently home to Titan will be gutted and expanded in preparation for Frontier, the an exascale system scheduled for delivery in 2021 running AMD Epyc processors and Nvidia GPUs
A power, cooling, and data center upgrade is already underway ahead of the Titan decommissioning to prepare for Frontier. The whole removal process will take about a month but has been in the works for several months to ensure a smooth transition for people still using the old machine.