0 Comments

Back in the day (pre Windows Vista, so we’re talking about Windows XP and Windows Server 2003), it was possible for a user and system services to share the same logical space. Depending on what your login settings were, the first user to login to a system was likely to be automatically assigned to session 0, which is where services and other system processes would run.

This was both a blessing and a curse.

It was a blessing because now the user would be able to see if a service decided to display a dialog or some other incredibly stupid mechanism to try and communicate.

It was a curse because now the user was running in the same space as system critical services, enabling a particularly dangerous attack vector for viruses and malicious code.

In Windows Vista this was changed, with the advent of Session 0 isolation. In short, services would now run in their own session (session 0) and whenever a user logged in they would automatically be assigned to sessions greater than 0.

The entire post should be considered with a caveat. I am certainly no expert in Windows session management, so while I’ve tried my best to understand the concepts at play, I cannot guarantee their technical correctness. I do know that the solution I outline later does allow for a workaround for the application I was working with, so I hope that will prove useful at least.

Vista was many years ago though, so you might be asking what relevance any of this has now?

Well, applications have a tendency to live long past when anyone expects them to, and old applications in particular have a tendency to accumulate cruft over the years.

I was working with one such application recently.

Denied

Most of the time, my preference is to work with virtual machines, especially when investigating or demonstrating software. Its just much easier to work in a virtual environment that can easily be reset to a known state.

I mostly use Virtual Box, but that’s just because its the virtualisation tool I am most familiar with. Virtual Box is all well and good, but it does make it very hard to collaborate with other people, especially considering the size of the virtual machines themselves (Windows is much worse than Linux). Its hard to pass the the virtual machine around, and its beyond most people to expose a virtual machine to the greater internet so someone in a different part of the world can access it.

As a result I’ve gravitated towards AWS for demonstration machines

AWS is not perfect (its hard to get older OS versions setup for example, which limits its usage for testing things that require a certain OS) but for centralizing demonstration machines its a godsend.

How does all of this relate to session 0 and old applications?

Well, I recently setup an EC2 instance in AWS to demonstrate to stakeholders some work that we’d been doing. In order to demonstrate some new functionality in our product, I needed to configure a third-party product in a particular way. I had done this a number of times before on local virtual machines, so imagine my surprise when I was confronted with an error message stating that I was not allowed to configure this particular setting when not logged in as a console user.

Console?

To most users, I would imagine that that error message is incredibly unhelpful.

Well, this is where everything ties back into session 0, because in the past, if you wanted to remote into a machine, and be sure that you were seeing the same thing each time you logged in, you would use the following command:

mstsc /console

This would put you into session 0, which is usually the same session as you would see when physically accessing the server, i.e. it was as if you were viewing the monitor/keyboard/mouse physically connected to the box. More importantly, it also let you interact with services that insisted on trying to communicate with the user through dialogs or SendMessage.

The consistent usage of the console switch could be used to prevent issues like Bob logging in and starting an application server, then Mary also logging in and doing the same. Without the /console switch, both would log into their own sessions, even if they were using the same user, and start duplicate copies of the application.

Being familiar with the concept (sometimes experience has its uses), I recognised the real meaning of the “you are not logged in as the console” message. It meant that it had detected that I was not in session 0 and it need to do something that requires communication to a service via outdated mechanisms. Disappointing, but the application has been around for a while, so I can’t be too mad.

Unfortunately, the console switch does not give access to session 0 anymore. At least not since the introduction of session 0 isolation in Vista. There is an /admin switch, but it has slightly different behaviour (its really only for getting access to the physical keyboard/screen, so not relevant in this situation).

Good Old Sysinternals

After scouring the internet for a while, I discovered a few things that were new to me.

The first was that when Microsoft introduced session 0 isolation they did not just screw over older applications. Microsoft is (mostly) good like this.

In the case of services that rely on interacting with the user through GUI components (dialogs, SendMessage, etc), you can enable the Interactive Services Detection Service (ui0detect). Once this service is enabled and running, whenever a service attempts to show a dialog or similar, a prompt will show up for the logged in user, allowing them to switch to the application.

The second was that you can actually run any old application you like in session 0, assuming you have administrator access to the machine.

This is where Sysinternals comes to the rescue yet again (seriously, these tools have saved me so many times, the authors may very well be literal angels for all I know).

Using psexec, you can start an application inside session 0.

psexec –s 0 –i {path to application}

You’ll need to be running as an Administrator (obviously), but assuming you have the Interactive Services Detection Service running, you should immediately receive a prompt that says something like “an application is trying to communicate with you”, which you can then use to switch to the GUI of the application running in session 0.

With this new power it was a fairly simple matter to start the application within session 0, fooling whatever check it had, which allowed me to change the setting and demonstrate our software as needed.

Conclusion

As I mentioned earlier, software has an unfortunate tendency to live for far longer than you think it should.

I doubt the person who wrote the console/session 0 check inside the application expected someone to be installing and running it inside a virtual machine hosted purely remotely in AWS. In fact, when the check was written, I doubt AWS was even a glimmer in Chris Pinkham’s eye. I’m sure the developer had a very good reason for the check (it prevented a bug or it allowed a solution that cost 1/4 as much to implement), and they couldn’t have possibly anticipated the way technology would change in the future.

Sometimes I worry that for all the thought we do put into software, and all the effort we put into making sure that it will do what it needs to do as long as it needs to, its all somewhat pointless. We cannot possibly anticipate shifts in technology or users, so really the only reasonable approach is to try and make sure we can change anything with confidence.

Honestly, I’m surprised most software works at all, let alone works mostly as expected decades later.

0 Comments

As I mentioned in a previous post, I recently started a new position at Onthehouse.

Onthehouse uses Amazon EC2 for their cloud based virtualisation, including that of the build environment (TeamCity). Its common for a build environment to be largely ignored as long as it is still working, until the day it breaks and then it all goes to hell.

Luckily that is not what happened.

Instead, the team identified that the build environment needed some maintenance, specifically around one of the application specific Build Agents.

Its an ongoing process, but the reason for there being an application specific Build Agent is because the application has a number of arcane, installed, licenced third-party components. Its VB6, so its hard to manage those dependencies in a way that is mobile. Something to work on in the future, but not a priority for right now.

My first task at Onthehouse, was to ensure that changes made to the running Instance of the Build Agent had been appropriately merged into the base Image. As someone who had never before used the Amazon virtualisation platform (Amazon EC2) I was somewhat confused.

This post follows my journey through that confusion and out the other side into understanding and I hope it will be of use to someone else out there.

As an aside, I think that getting new developers to start with build services is a great way to familiarise them with the most important part of an application, how to build it. Another fantastic first step is to get them to fix bugs.

Virtually Awesome

As I mentioned previously, I’ve never used AWS (Amazon Web Services) before, other than uploading some files to an Amazon S3 account, let alone the virtualization platform (Amazon EC2).

My main experience with virtualisation comes from using Virtual Box on my own PC. Sure, I’ve used Azure to spin up machines and websites, and I’ve occasionally interacted with VMWare and Hyper-V, but Virtual Box is what I use every single day to build, create, maintain and execute sandbox environments for testing, exploration and all sorts of other things.

I find Virtual Box straightforward.

You have a virtual machine (which has settings, like CPU Cores, Memory, Disks, etc) and each machine has a set of snapshots.

Snapshots are a record of the state of the virtual machine and its settings at a point in time chosen by the user. I take Snapshots all the time, and I use them to easily roll back to important moments, like a specific version of an application, or before I wrecked everything by doing something stupid and destructive (it happens more than you think).

Thinking about this now, I’m not sure how Virtual Box and Snapshots interact with multiple disks. Snapshots seems to be primarily machine based, not disk based, encapsulating all of the things about the machine. I suppose that probably includes the disks. I guess I don’t tend to use multiple disks in the machines I’m snapshotting all the time, only using them rarely for specific tasks.

Images and Instances

Amazon EC2 (Elastic Compute) does not work the same way as Virtual Box.

I can see why its different to Virtual Box as they have entirely different purposes. Virtual Box is intended to facilitate virtualisation for the single user. EC2 is about using virtualisation to leverage the power of the cloud. Single users are a completely foreign concept. Its all about concurrency and scale.

In Amazon EC2 the core concept is an Image(or Amazon Machine Image, AMI). Images describe everything about a virtual machine, kind of like a Virtual Machine in Virtual Box. However, in order to actually use an Image, you must spin up an Instance of that Image.

At the point in time you spin up an Instance of an Image, they have diverged. The Instance typically contains a link back to its Image, but its not a hard link. The Instance and Image are distinctly separate and you can delete the Image (which if you are using an Amazon supplied Image, will happen regularly) without negatively impacting on the running instance.

Instances generally have Volumes, which I think are essentially virtual disks. Snapshots come into play here as well, but I don’t understand Volumes and Snapshots all that well at this point in time, so I’m going to conveniently gloss over them. Snapshots definitely don’t work like VirtualBox snapshots though, I know that much.

Instances can generally be rebooted, stopped, started and terminated.

Reboot, stop and start do what you expect.

Terminating an instance kills it forever. It also kills the Volume attached to the instance if you have that option selected. If you don’t have the Image that the Instance was created from, you’re screwed, its gone for good. Even if you do, you will have lost any change made to that Image since the Instance began running.

Build It Up

Back to the Build environment.

The application specific Build Agent had an Image, and an active Instance, as normal.

This Instance had been tweaked, updated and changed in various ways since the Image was made, so much so that no-one could remember exactly what had been done. Typically this wouldn’t be a major issue, as Instances don’t just up and disappear.

Except this Instance could, and had in the past.

The reason for its apparently ephemeral nature was because Amazon offers a spot pricing option for Instances. Spot pricing allows you to create a spot request and set your own price for an hour of compute time. As long as the spot price is below that price, your Instance will run. If the spot price goes above your price, your Instance dies. You can setup your spot price request to be reoccurring, such that the Instance will restart when the price goes down again, but you will have lost all information not on the baseline Image (an event like that is equivalent to terminating the instance and starting another one).

Obviously we needed to ensure that the baseline Image was completely able to run a build of the application in question, requiring the minimal amount of configuration on first start.

Thus began a week long adventure to take the current base Image, create an Instance from it, and get a build working, so we could be sure that if our Instance was terminated it would come back and we wouldn’t have to spend a week getting the build working again.

I won’t go into detail about the whole process, but it mostly involved lots of manual steps to find out what was thing was wrong this time, fixing it in as nice a way as time permitted and then trying again. It mostly involved waiting. Waiting for instances, waiting for builds, waiting for images. Not very interesting.

A Better Approach

Knowing what I know now (and how long the whole process would take), my approach would be slightly different.

Take a snapshot of the currently running Instance, spin up an Instance of it, change all of the appropriate unique settings to be invalid (Build Agent name mostly) and then take another Image. That’s your baseline.

Don’t get me wrong, it was a much better learning experience the first way, but it wasn’t exactly an excellent return on investment from the point of view of the organisation.

Ah well, hindsight.

A Better Architecture

The better architecture is to have TeamCity managed the lifetime of its Build Agents, which it is quite happy to do via Amazon EC2. TeamCity can then manage the instances as it sees fit, spinning them down during idle periods, and even starting more during periods of particularly high load (I’m looking at you, end of the iteration crunch time).

I think this is definitely the approach we will take in the future, but that’s a task for another day.

Conclusion

Honestly, the primary obstacle in this particular task was learning how Amazon handles virtualization, and wrapping my head around the differences between that and Virtual Box (which is where my mental model was coming from). After I got my head around that I was in mostly familiar territory, diagnosing build issues and determining the best fix that would maximise mobility in the future, while not requiring a massive amount of time.

From the point of view of me, a new starter, this exercise was incredibly valuable. It taught me an immense amount about the application, its dependencies, the way its built and all sorts of other, half-forgotten tidbits.

From the point of view of the business, I should have definitely realized that there was a quicker path to the end goal (make sure we can recover from a lost Build Agent instance) and taken that into consideration, rather than try to work my way through the arcane dependencies of the application. There’s always the risk that I missed something subtle as well, which will rear its ugly head next time we lose the Build Agent instance.

Which could happen.

At any moment.

(Cue Ominous Music)