Research Storage

Use the following links for obtaining and using research storage:

Best Practices for Backing Up Data

Under a program provided by CITS, each research faculty member is offered one terabyte of backupspace in a modern, secure computer facility located at 300 W. Lexington Street in Baltimore, just off the UMB campus. Additional storage may be purchased at prices subsidized by CITS. This is part of a program that will eventually offer primary storage space and high performance computing for research.

In order to make best use of the facility and increase efficiency, CITS recommends that certain practices be followed if possible.

Data Backups:

If a project team or laboratory were to back up important data to the research storage facility several items should be considered. Rather than connecting every researcher’s desktop to the new storage facility, it would be more efficient to consolidate backup data for the entire group before sending it. Several tools can do this at no to moderate cost. A backup tool, installed on one machine, can be used to select data for backup from every machine in a laboratory or office. The initial data capture will be large and will take, depending upon the connection speed of the backup machine to the campus network, a long time. One terabyte of data, transmitted at 100 megabits per second, will take more than twenty-four hours. This should be scheduled on a weekend or a known period of inactivity. After this data transfer, a good backup application will extract only changed data during each subsequent backup cycle and will transmit incremental amounts of data to the 300 West Lexington Street facility. To reduce the amount of data being stored, the best practice is to NOT back up the operating system and applications on a workstation. Personal information, like music files and pictures, should not be backed up to the university’s storage infrastructure.

If the user is backing up from a laptop, or some other device that doesn't maintain permanent network connectivity, the user should consider how to do the initial backup. The initial backup may take longer than a standard business day, and if a
laptop is carried between home and work, it will need to be left behind so the initial backup can complete. An alternative would be to manually backup select folders at a time, in order to break up the periods of network activity into smaller chunks. A permanently connected desktop is better suited for performing lengthy backups.

Be aware of extra large files that change frequently (like VM Disk Images, and Microsoft Email databases). Even a small change in these files will require them to be backed up repeatedly, wasting space and slowing down overall backup performance.

Backup tools should be selected by individual users or workgroups. Initially CITS does not intend to back up data in the research storage facility in phase 1 and cannot support local backup activities. There are tools being used on campus that work reasonably well and they are listed below.

Retrospect:

This can be installed on a Windows machine and can back up any Windows, Macintosh, or Linux machine in a group. It can send incremental back up data to the research storage facility on a regular schedule or on demand.

Macintosh Time Machine:

Time Machine provides a solid product for backing up Macintoshes with the following caveat: the device to which data is being stored needs to be a /block/ device (as in, hard drive-like). This can only be accomplished through the use of iSCSI. CITS personnel will work closely with researchers wishing to employ this solution.

GoodSynch:

This is a multi-platform backup and file synchronization product. It works with an SSL-protect WebDAV storage location so it is attractive for our site.

CITS will continue to monitor and test new products on the market and share what we learn. Many times one of our researchers finds something that is useful to others. CITS will be the common point of contact for sharing those findings as well.

Why iSCSI

iSCSI is clean for big departments. On this campus each school, lab, and department exercises a great deal of autonomy without much centralized supervision or organization. That gives freedom and flexibility to small groups or individuals, but makes managing a centralized IT program very difficult. There is no centralized user management system (the enterprise directory comes closest); so binding to an AD Domain is out of the question (RX has one, SOM has several, SON has their own, etc.) 

So, using iSCSI in a "spoke and hub" fashion (where CITS runs a LUN to a designated machine (the "hub") and then the Principal Investigator shares it out as desired to the department or research team (the "spokes") eliminates many user-management problems. This also eliminates the need for CITS to manage machine-level permissions for individual shares; local administrators will be more responsive and knowledgeable about department needs than CITS.

Another advantage of iSCSI is that it's the lowest common denominator platform-wise. All platforms (Windows, Mac, Linux) have support for it, in one form or another. The same goes for WebDAV (which also supports authenticating via the Enterprise Directory). NFS and CIFS are fairly platform-dependent, not to mention the networking issues that would be involved in getting those protocols across campus.