Sep1st2007

PostgreSQL Upgrade

Well this week was fun. For some reason one of our main clusters that runs client ASP software for general office, file, email, collaboration, etc… went crazy.

First I noticed that the usual night time “Vacuum’s” that are needed to keep the PostgreSQL planar at it’s most efficient and indexes clean, was running right into the day time! It usually needed less than an hour for the 20GB database we currently have.

So after many failed attempts to get an online vacuum done. I stayed up really late, took the cluster down and did FULL vacuum. Full vacuum’s are slow, and you can’t run anything while they happen because they do full table locks, where as online vacuum’s do quick tuple/row level locks.

Anyway, database seemed speedier, but system was still sluggish. I have all data separated. Database files are on large RAID10 arrays with U320 SCSI drives spinning at 15K – split over TWO SCSI buses! Yeah tis fast. Big disks are used because it means relative to the size of the disk, more data is on the outer edge of the platters, that spin faster than the centre of the platters. I also keep PostgreSQL’s transaction log on a separate RAID1 array with 73GB 15K U320 drives as well with a 256MB battery backed cache.

Now database based operations were zippy again….the system while set by me to not rate a disk search at too high a cost due to the super speed disk IO that I have, still should get SOME data out of the cache and not run to the comparatively sloooow disks straight away. After doing this all was cool again database wise.

However operations that needed the data store to be accessed were still piss slow. So email ingestion, file usage, etc crawled. The data store which compliments the database data weighs in at about 250GB now. And this is on a single RAID1 array with 300GB U320 10K drives. I also remembered that over a 1000BaseT network to backup to a robotic tape library, the data store array (RAID1) maxes out at about 750MB a min. While the RAID10 with the database does it at 2700+MB a min.

So I thought, “lets add some more disks to the data store array”. I get more space, so relatively speaking more of the data is on the outside edges of all the platters (another topic for what one can do with LVM), the array has double the heads and spindles too – and some RAID0 goodness inside that RAID10 nested set.

Now Dell’s storage white paper from 2005 does state that a RAID1 to a RAID10 migration/raid level reconstruction is supported. RAID10 being two RAID1 arrays striped together in RAID0 (with the mirror part straddling 2 SCSI buses for speed and channel redundancy). However when I went into Open Manage after putting the drives in the chassis, expecting to be able to “reconstruct/migrate levels” to a RAID10, then pop into LVM, create some new data devices, add them to my Volume group and then expand my partition and then finally my file system – all while still being online……..I was greeted with only the option to reconstruct/migrate to RAID5 (Get ^*^*&^) or RAID0 (Uh, yeah, OK..).

So feeling quite annoyed now. I then went and made the two new drives into a new RAID1 set, thinking that I would then be able to add this to a final “nested” array with the current RAID1 set and make a RAID10 out of them. Well I thought it was working, but it wasn’t. All I managed to do was end up with my existing array being made into a RAID0 (heart attack!!) and the new array sitting there untouched. Also the Open Manage array management setup gave me no choice of stripe size. It defaulted to 64K. I prefer 128K to ensure the best chance of the records/tuples fitting into one whole stripe to maximise concurrency of head operations on different data records – and also because the PERC4 family of RAID cards does not suffer any penalty if a stripe size is too big for the data used – so why not eh?

So what am I doing now at 5:32 AM?

Rsyncing all data again to the hot standby server, will then power off the cluster, log in with the DRAC card, go into the BIOS of the RAID card and redo my friggin array from scratch, as RAID10 and 128K stripe and then boot up and copy my damn data back.

Not happy pappy. Not happy. I think it is SAN time…unless SAN’s can be this anal as well?

So what does this have to do with PostgreSQL upgrade? Well since the cluster is down anyway, I might as well go from 7.4 to 8.1. Get some of that auto vacuum goodness and lap up some of the apparent massive speed boosts in the 5 years of development between the two versions.