For starters, the idea of backing up data seems easy. You backup the data to a specified location and keep it around in case disaster strikes. When that happens you restore the data.
Sounds easy but in practice there is a lot more involved. Especially if you have to manage backups for around 500 servers – including VMWare virtual machines (and that is a topic for another post). In a robust environment you have things like multiple media servers, multiple tape libraries (both virtual and physical), disk-based storage pools and SANs that add complexity to even the best designed setups.
We have a well designed environment. At least I like to believe that it is – I designed it almost 2 yrs ago and implemented it well over a year ago. Even with a well designed environment, Thursday became a day to learn lessons for me and another team member.
We scheduled a maintenance to reboot the servers in the environment. This was recommended to us by Symantec and by one of our consulting partners who assisted me in deploying the environment. I am still not sure whether this is a Symantec best practice for NetBackup, or just a general Windows Server best practice but it was highly recommended that we reboot all the servers at least once a quarter.
Rebooting a server in a simple, vanilla environment must be painless. Rebooting (1) NetBackup master server and (6) NetBackup media servers is not painless. Not at all.
Here are some things I learned:
- Multiple master servers might be a good thing. When you reboot a master server everything stops. No new jobs can run. You can’t edit policies. You can’t reset drive paths. You get the picture.
- It does matter how you assign tape drives. If you only allow certain servers to see certain drives and certain robots things start to get confusing. Before we rebooted, I had (46) virtual tape drives assigned to out to hosts separately to split the across multiple roles – (14) to the VTL’s media server for vaulting, (8) to the VMWare VCB server and (24) to the (3) general purpose media servers. Now the VTL media server gets its (14) drives, but the rest of the servers share the remaining (32).
- The embedded media server on EMC’s EDL4106 only sees (24) tape drives. Not sure why this is but it made me scratch my head and make a note. I wonder if it is a Linux thing. the dmesg output showed errors attempting to attach to the other (22) drives.
- Multipathing – Good for Storage, Bad for Backup? The way the EDL4106 assigns drives only allows you to attached a drive to one path on the server. Hmmm… and I have (2) HBAs in my media servers for what reason?
- Windows 2008 x64 Enterprise Edition has great driver management. Windows 2003 32-bit Enterprise Edition doesn’t. I really dislike having to reboot and reinstall drivers everytime I reboot the environment.
- Persistent LUN bindings don’t appear to work for tape devices. This I will definitely have to research.
- Adding a NetBackup media server is painful when it needs to talk to over 400 servers. Very painful.
There will be more on this as the problems get addressed. Thankfully Symantec will be performing a healthcheck on environment in the next month!