Center for Information Technology Services
A Message from the CIO
Achieving Goal Number One in the Campus IT Plan: High Availability/High Reliability of Campus Infrastructure and Systems.
A solid IT infrastructure is a critical underpinning for effective University academic and administrative operations. The IT infrastructure is the key component for the operating success of applications and systems. Today, IT infrastructures need to be elevated to the highest level and where possible, operate 24x7 and 365 days a year. Students, faculty and staff expect to be able to use applications and systems (e-mail, web, learning, research, clinical, management, etc.) at any time and anywhere.
The redundancy of critical applications and systems is needed because technology can and will fail. At approximately 10:00 p.m. on Sunday October 5, the storage hardware that contains several campus computer systems suffered a failure. The critical component that failed was the data service processor (DSP) which manages connectivity to the data that reside on the storage hardware. This “crash” rendered the storage hardware unavailable and the problem could only be corrected by Sun Microsystems, Inc. engineers.
When the incident occurred, the CITS Communications Plan was activated. The campus was informed of the problem through the use of the campus Critical Systems Notification email list, the many individual application user email lists, as well as through IT alerts on web pages. The Sun Microsystems, Inc. storage engineers immediately began working with CITS system administrators to resolve the problem. The magnitude and complexity of the problem was such that the Sun engineers could not quickly resolve the problem. Both the Sun engineers and CITS systems administrators worked continuously through the night and during the day on Monday until the storage hardware was fully restored and operational at about 8:00 p.m. Monday evening. While some software applications were unavailable during the day, other applications that already had redundant hardware and replicated database environments in two locations, e.g., Blackboard learning and transaction system, were activated and made available for use on that Monday.
This incident further validated the importance of the number one priority in the Campus IT Plan, i.e., achieving high availability of campus infrastructure and systems. Since December 2007, major activities toward this goal have been completed. The most significant undertaking was re-architecting and implementing a new technology infrastructure for anti-spam, anti-virus, and the delivery of over 10 million email messages to the campus every day. This new infrastructure provides faster mail processing and greater infrastructure redundancy, as well as more sophisticated anti-spam technology. In addition, CITS acquired and placed another network switch between the Internet gateway and the campus network. This additional network switch, as well as additional network routers that will be installed this month, will provide greater redundancy and continued Internet service in the event that one of theses devices is disabled for any reason. Also, the Coeus research application database, in addition to the Blackboard learning and transaction system database, is being continuously replicated, with a copy of these systems databases being available in both the CITS Computer Room and in Howard Hall. In the event of a failure like the one we experienced the evening of October 5, the Coeus and Blackboard applications can be back online in a matter of minutes. A redundant Coeus web server will also be installed this month and be available for use in the event of a web server failure in the CITS Computer Room.
During the next couple of months, we will be replicating and maintaining databases in two locations for the following additional applications: the student information management system (SIMS/SURFS); the content capture system (Mediasite); the online testing application (QuestionMark); and the Web content management system. Furthermore, the new Microsoft Exchange 2007 campus email system (which will be available in spring 2009) will have fully replicated databases, servers and storage, as well as faster processing, enhanced security functions, and additional features.
The longer-term solution (which is also being pursued now), would include hardware, software, and storage in another computer facility as well as the software to perform real-time replication of both database AND non-database files between the machines in the CITS computer room and the machines in the other facility. This solution will allow us to provide redundancy and fail-over capabilities for additional applications and further reduce risk by having critical campus systems and data in two computer room facilities. In this scenario, systems and services will continue to be available in the event of a disaster to hardware or software, to the CITS computer room, or even to the HS/HSL building where the CITS Computer Room is located.
Today, technology is central to business and academic operations. UMB operations require access to applications and systems which initiate electronic processes and provide needed data for the University community. A lot of work has been completed, and it will continue in the future, in building highly available campus systems. The benefit of having this level of redundancy is the seamless and uneventful transition to back-up systems and databases and having critical services available when technology fails in the future. We will keep you posted as we continue to make progress with this very important initiative.
Peter J. Murray, Ph.D.
Vice President and CIO