At 8:30am this morning the site was in the same "stable" slow but usable state that it had been. Within 30 minutes it once again was eating up all resources, and we were forced to make the decision to switch to the old site so we could have the opportunity to diagnose and fix the problem.
Since Sunday, we have been inspecting all queries (including as you point out Doctor queries) as well as looking for other processes that may be impacting cpu usage. With respect to database issues we are working to identify any offending queries and simplifying them.
Additionally, we have installed on the server memcache, and we are planning on implementing object caching. We were unable to effectively implement this overnight when the server was crawling slowly, but now we are able to access the server with speed.
The site of course is backed up, however there is nothing to roll back to. The issues with the database and server were not caused by any change during the last two days.
The preview site is the production site -- not a mirror. The reason it was working fine was probably that there was not the same load as we have now, therefore not the same number of queries.
We have a number of specialists working on this and we will continue to. We want the new site back up as much as you do! I will keep you informed.