0 Comments

I did a strange thing a few weeks ago. Something I honestly believed that I would never have to do.

I populated a Single Page Javascript Application with data extracted from a local database via a C# application installed using ClickOnce.

But why?

I maintain a small desktop application that acts as a companion to a service primarily offered through a website. Why do they need a companion application? Well, they participate in the Australian medical industry, and all the big Practice Management Software (PMS) Systems available are desktop applications, so if they want to integrate with anything, they need to have a presence on the local machine.

So that’s where I come in.

The pitch was that they need to offer a new feature (a questionnaire) that interacted with installed PMS, but wanted to implement the feature using web technologies, rather than putting it together using any of the desktop frameworks available (like WPF). The reasoning behind this desire was:

  1. The majority of the existing team had mostly web development experience (I’m pretty much the only one that works on the companion application), so building the whole thing using WPF or something similar wouldn’t be making the best use of available resources, 
  2. They wanted to be able to reuse the implementation in other places, and not just have it available in the companion application.
  3. They wanted to be able to reskin/improve the feature without having to deploy a new companion application

Completely fair and understandable points in favour of the web technology approach.

Turtles All The Way Down

When it comes to running a website inside a C# WPF desktop application, there are a few options available. You can try to use the native WebBrowser control that comes with .NET 3.5 SP1, but that seems to rely on the version of IE installed on the computer and is fraught with all sorts of peril.

You’re much better off going with a dedicated browser control library, of which the Chromium Embedded Framework is probably your best bet. For .NET that means CEFSharp (and for us, that means its WPF browser control, CEFSharp.WPF).

Installing CEFSharp and getting it to work in a development environment is pretty straightforward, just install the Nuget packages that you need and its all probably going to start working.

While functional, its not the greatest experience though.

  • If you’re using the CEFSharp WPF control, don’t try to use the designer. It will crash, slowly and painfully. Its best to just do the entire thing in code behind and leave the browser control out of the XAML entirely.
  • The underlying C++ library (libcef.dll) is monstrous, weighing in at around 48MB. For us, this was about 4 times the size of the entire application, so its a bit of a change.
  • If you’re using ClickOnce to deploy your application, you’re in for a bit of an adventure.

The last point is the most relevant to this post.

Because of the way ClickOnce works, a good chunk of the dependencies that are required by CEFSharp will not be recognised as required files, and thus will not be included in the deployment package when its built. If you weren’t using ClickOnce, everything would be copied correctly into the output folder by build events and MSBuild targets, and you could then package up your application in a sane way, but once you pick a deployment technology, you’re pretty much stuck with it.

You’ll find a lot of guidance on the internet about how to make ClickOnce work with CEFSharp, but I had the most luck with the following:

  1. Add Nuget references to the packages that you require. For me this was just CEFSharp.WPF, specifically version 49.0.0 because my application is still tied to .NET 4.0. This will add references to CEFSharp.Common and cef.redist.x86 (and x64).
  2. Edit the csproj file to remove any imports of msbuild targets added by CEFSharp. This is how CEFSharp would normally copy itself to the output folder, but it doesn’t work properly with ClickOnce. Also remove any props imports, because this is how CEFSharp does library references.
  3. Settle on a processor architecture (i.e. x86 or x64). CEFSharp only works with one or the other, so you might as well remove the one you don’t need.
  4. Manually add references to the .NET DLLs (CefSharp.WPF.dll, CefSharp.Core.dll and CefSharp.dll). You can find these DLLs inside the appropriate folder in your packages directory.
  5. This last step deals entirely with the dependencies required to run CEFSharp and making sure they present in the ClickOnce deployment. Manually edit the csproj file to include the following snippet:

    <ItemGroup>    
        <Content Include="..\packages\cef.redist.x86.3.2623.1396\CEF\*.*">
          <Link>%(Filename)%(Extension)</Link>
          <CopyToOutputDirectory>Always</CopyToOutputDirectory>
          <Visible>false</Visible>
        </Content>
        <Content Include="..\packages\cef.redist.x86.3.2623.1396\CEF\x86\*.*">
          <Link>%(Filename)%(Extension)</Link>
          <CopyToOutputDirectory>Always</CopyToOutputDirectory>
          <Visible>false</Visible>
        </Content>
        <Content Include="..\packages\CefSharp.Common.49.0.0\CefSharp\x86\CefSharp.BrowserSubprocess.*">
          <Link>%(Filename)%(Extension)</Link>
          <CopyToOutputDirectory>Always</CopyToOutputDirectory>
          <Visible>false</Visible>
        </Content>
        <Content Include="lib\*.*">
          <Link>%(Filename)%(Extension)</Link>
          <CopyToOutputDirectory>Always</CopyToOutputDirectory>
          <Visible>false</Visible>
        </Content>
        <Content Include="$(SolutionDir)packages\cef.redist.x86.3.2623.1396\CEF\locales\*">
          <Link>locales\%(Filename)%(Extension)</Link>
          <CopyToOutputDirectory>Always</CopyToOutputDirectory>
          <Visible>false</Visible>
        </Content>
    </ItemGroup>
    This particularly brutal chunk of XML adds off the dependencies required for CEFSharp to work properly as content files into the project, and then hides them, so you don’t have to see their ugly faces whenever you open your solution.

After taking those steps, I was able to successfully deploy the application with a working browser control, free from the delights of constant crashes.

Injected With A Poison

Of course, opening a browser to a particular webpage is nothing without the second part; giving that webpage access to some arbitrary data from the local context.

CEFSharp is pretty good in this respect to be honest. Before you navigate to the address you want to go to, you just have to call RegisterJsObject, supplying a name for the Javascript variable and a reference to your .NET object. The variable is then made available to the Javascript running on the page.

The proof of concept that I put together used a very simple object (a class with a few string properties), so I haven’t tested the limits of this approach, but I’m pretty sure you can do most of the basic things like arrays and properties that are objects themselves (i.e. Foo.Bar.Thing).

The interesting part is that the variable is made available to every page that is navigated to in the Browser from that point forward, so if there is a link on your page that goes somewhere else, it will get the same context as the previous page did.

In my case, my page was completely trivial, echoing back the injected variable if it existed.

<html>
<body>
<p id="content">
Script not run yet
</p>
<script>
if (typeof(injectedData) == 'undefined')
{
    document.getElementById("content").innerHTML = "no data specified";
}
else 
{
    document.getElementById("content").innerHTML = "name: " + injectedData.name + ", company: " + injectedData.company;
}
</script>
</body>
</html>

Summary

It is more than possible to have good integration between a website (well, one featuring Javascript at least) and an installed C# application, even if you happen to have the misfortune of using ClickOnce to deploy it. Really, all of the congratulations should go to the CEFSharp guys for creating an incredibly useful and flexible browser component using Chromium, even if it does feel monstrous and unholy.

Now, whether or not any of the above is a good idea is a different question. I had to make some annoying compromises to get the whole thing working, especially with ClickOnce, so its not exactly in the best place moving forward (for example, upgrading the version of CEFSharp is going to be pretty painful due to those hard references that were manually added into the csproj file). I made sure to document everything I did in the repository readme, so its not the end of the world, but its definitely going to cause pain in the future to some poor bastard.

This is probably how Dr Frankenstein felt, if he were a real person.

Disgusted, but a little bit proud as well.

0 Comments

A few months back I made a quick post about some automation that we put into place when running TeamCity Build Agents on spot-price instances in AWS. Long story short, we used EC2 userdata to automatically configure and register the Build Agent whenever a new spot instance was spun up, primarily as a way to deal with the instability in the spot price which was constantly nuking our machines.

The kicker in that particular post was that when we were editing the TeamCity Build Agent configuration, Powershell was changing the encoding of the file such that it looked perfectly normal at a glance, but the build agent was completely unable to read it. This lead to some really confusing errors about things not being set when they were clearly set and so on.

All in all, it was one of those problems that just make you hate software.

What does all of this have to do with this weeks post?

Well, history has a weird habit of repeating itself in slightly different ways.

More Robots

As I said above, we’ve put in some effort to make sure that our TeamCity Build Agent AMI’s can mostly take care of themselves on boot if you’ve configured the appropriate userdata in EC2.

Unfortunately, each time we wanted a brand new instance (i.e. to add one to the pool or to recreate existing ones because we’d updated the underlying AMI) we still had to go into the AWS Management Dashboard and set it all up manually, which meant that we needed to remember to set the userdata from a template, making sure the replace the appropriate tokens.

Prone to failure.

Being that I had recently made some changes to the underlying AMI (to add Powershell 5, MSBuild 2015 and .NET Framework 4.5.1) I was going to have to do the manual work.

That’s no fun. Time to automate.

A little while later I had a relatively simple Powershell script scraped together that would spin up an EC2 instance (spot or on-demand) using our AMI, with all of our requirements in place (tagging, names, etc).

[CmdletBinding()]
param
(
    [Parameter(Mandatory=$true)]
    [ValidateNotNullOrEmpty()]
    [string]$awsKey,
    [Parameter(Mandatory=$true)]
    [ValidateNotNullOrEmpty()]
    [string]$awsSecret,
    [string]$awsRegion="ap-southeast-2",
    [switch]$spot=$false,
    [int]$number
)

$here = Split-Path $script:MyInvocation.MyCommand.Path
. "$here\_Find-RootDirectory.ps1"

$rootDirectory = Find-RootDirectory $here
$rootDirectoryPath = $rootDirectory.FullName

. "$rootDirectoryPath\scripts\common\Functions-Aws.ps1"

Ensure-AwsPowershellFunctionsAvailable

$name = "[{team}]-[dev]-[teamcity]-[buildagent]-[$number]"

$token = [Guid]::NewGuid().ToString();

$userData = [System.IO.File]::ReadAllText("$rootDirectoryPath\scripts\buildagent\ec2-userdata-template.txt");
$userData = $userData.Replace("@@AUTH_TOKEN@@", $token);
$userData = $userData.Replace("@@NAME@@", $name);

$amiId = "{ami}";
$instanceProfile = "{instance-profile}"
$instanceType = "c3.large";
$subnetId = "{subnet}";
$securityGroupId = "{security-group}";
$keyPair = "{key-pair}";

if ($spot)
{
    $groupIdentifier = new-object Amazon.EC2.Model.GroupIdentifier;
    $groupIdentifier.GroupId = $securityGroupId;
    $name = "$name-[spot]"
    $params = @{
        "InstanceCount"=1;
        "AccessKey"=$awsKey;
        "SecretKey"=$awsSecret;
        "Region"=$awsRegion;
        "IamInstanceProfile_Arn"=$instanceProfile;
        "LaunchSpecification_InstanceType"=$instanceType;
        "LaunchSpecification_ImageId"=$amiId;
        "LaunchSpecification_KeyName"=$keyPair;
        "LaunchSpecification_AllSecurityGroup"=@($groupIdentifier);
        "LaunchSpecification_SubnetId"=$subnetId;
        "LaunchSpecification_UserData"=[System.Convert]::ToBase64String([System.Text.Encoding]::Unicode.GetBytes($userData));
        "SpotPrice"="0.238";
        "Type"="persistent";
    }
    $request = Request-EC2SpotInstance @params;
}
else
{
    . "$rootDirectoryPath\scripts\common\Functions-Aws-Ec2.ps1"

    $params = @{
        ImageId = $amiId;
        MinCount = "1";
        MaxCount = "1";
        KeyName = $keyPair;
        SecurityGroupId = $securityGroupId;
        InstanceType = $instanceType;
        SubnetId = $subnetId;
        InstanceProfile_Arn=$instanceProfile;
        UserData=$userData;
        EncodeUserData=$true;
    }

    $instance = New-AwsEc2Instance -awsKey $awsKey -awsSecret $awsSecret -awsRegion $awsRegion -InstanceParameters $params -IsTemporary:$false -InstancePurpose "DEV"
    Tag-Ec2Instance -InstanceId $instance.InstanceId -Tags @{"Name"=$name;"auto:start"="0 8 ALL ALL 1-5";"auto:stop"="0 20 ALL ALL 1-5";} -awsKey $awsKey -awsSecret $awsSecret -awsRegion $awsRegion
}

Nothing special here. The script leverages some of our common scripts (partially available here) to do some of the work like creating the EC2 instance itself and tagging it, but its pretty much just a switch statement and a bunch of parameter configuration.

On-Demand instances worked fine, spinning up the new Build Agent and registering it with TeamCity as expected, but for some reason instances created with the –Spot switch didn’t.

The spot request would be created, and the instance would be spawned as expected, but it would never configure itself as a Build Agent.

Thank God For Remote Desktop

As far as I could tell, instances created via either path were identical. Same AMI, same Security Groups, same VPC/Subnet, and so on.

Remoting onto the bad spot instances I could see that the Powershell script supplied as part of the instance userdata was not executing. In fact, it wasn’t even there at all. Typically, any Powershell script specified in userdata with the <powershell></powershell> tags is automatically downloaded by the EC2 Config Service on startup and placed inside C:/Program Files (x86)/Amazon/EC2ConfigService/Scripts/UserData.ps1, so it was really unusual for there to be nothing there even though I had clearly specified it.

I have run into this sort of thing before though, and the most common root cause is that someone (probably me) forgot to enable the re-execution of userdata when updating the AMI, but that couldn’t be the case this time, because the on-demand instances were working perfectly and they were all using the same image.

Checking the userdata from the instance itself (both via the AWS Management Dashboard and the local meta data service at http://169.254.169.254/latest/user-data) I could clearly see my Powershell script.

So why wasn’t it running?

It turns out that the primary difference between a spot request and an on-demand request is that you have to base64 encode the data yourself for the spot request (whereas the library takes care of it for the on-demand request). I knew this (as you can see in the code above), but what I didn’t know was that the EC2 Config Service is very particular about the character encoding of the underlying userdata. For the base64 conversion, I had elected to interpret the string as Unicode bytes, which meant that while everything looked fine after the round trip, the EC2 Config Service had no idea what was going on. Interpreting the string as UTF8 bytes before encoding it made everything work just fine.

Summary

This is another one of those cases that you run into in software development where it looks like something has made an assumption about its inputs, but hasn’t put the effort in to test that assumption before failing miserably. Just like with the TeamCity configuration file, the software required that the content be encoded as UTF8, but didn’t tell me when it wasn’t.

Or maybe it did? I couldn’t find anything in the normal places (the EC2 Config Service log files), but those files can get pretty big, so I might have overlooked it. AWS is a pretty well put together set of functionality, so its unlikely that something as common as an encoding issue is completely unknown to them.

Regardless, this whole thing cost me a few hours that I could have spent doing something else.

Like shaving a different yak.

0 Comments

A number of people much smarter than me have stated that there are only two hard problems in Computer Science, naming things, cache invalidation and off by one errors. Based on my own experience, I agree completely with this sentiment, and its one of the reasons why I always hesitate to incorporate caching into a system until its absolutely necessary.

Unfortunately, absolutely necessary always comes much sooner than I would like.

Over the last few weeks we’ve been slowly deploying our data freeing functionality to our customers. The deployment occurs automatically without user involvement (we have a neat deployment pipeline) so that has been pretty painless, but we’ve had some teething issues on the server side as our traffic ramped up. We’re obviously hosting everything in AWS, so its not like we can’t scale as necessary (and we have), but just throwing more power at something is a road we’ve been down before, and it never ends well. Plus, we’re software engineers, and when it looks like our code is slow or inefficient, it makes us sad.

The first bottleneck we ran into was our authentication service, and mitigating that particular problem is the focus of this post.

For background, we use a relatively simple token based approach to authentication. Consumers of our services are required to supply some credentials directly to our auth service to get one of these tokens, which can then be redeemed to authenticate against other services and to gain access to the resources the consumer is interested in. Each service knows that the auth service is the ultimate arbitrator for resolving those tokens, so they connect to the service as necessary for token validation and make authorization decisions based on the information returned.

A relatively naive approach to authentication, but it works for us.

Its pretty spammy though, and therein lies the root cause of the bottleneck.

I Already Know The Answer

One of the biggest factors at play here is that every single authenticated request (which is most of them) coming into any of our services needs to hit another service to validate the token before it can do anything else. It can’t even keep doing things in the background while the token is being validated, because if the auth fails, that would be a waste of effort (and a potential security risk if an error occurs and something leaks out).

On the upside, once a service has resolved a token, its unlikely that the answer will change in the near future. We don’t even do token invalidation at this stage (no need), so the answer is actually going to be valid for the entire lifetime of the token.

You can probably already see where I’m going with this, but an easy optimization is to simply remember the resolution for each token in order to bypass calls to the auth service when that token is seen again. Why ask a question when you already know the answer? This works particularly well for us in the data synchronization case because a single token will be used for a flurry of calls.

Of course, now we’re in the caching space, and historically, implementing caching can increase the overall complexity of a system by a large amount. To alleviate some of the load on the auth service, we just want a simple in-memory cache. If a particular instance of the service has seen the token, use the cache, else validate the token and then cache the result. To keep it simple, and to deal with our specific usage pattern (the aforementioned flurry), we’re not even going to cache the token resolution for the entire lifetime of the token, just for the next hour.

We could write the cache ourselves, but that would be stupid. Caching IS a hard problem, especially once you get into invalidation. Being that caching has been a known problem for a while, there are a lot of different components available for you to use in your language of choice, its just a matter of picking one.

But why pick just one?

There Has To Be A Better Way

As we are using C#, CacheManager seems like a pretty good bet for avoiding the whole “pick a cache and stick with it” problem.

Its a nice abstraction over the top of many different caching providers (like the in-memory System.Runtime.Caching and the distributed Redis), so we can easily implement our cache using CacheManager and then change our caching to something more complicated later, without having to worry about breaking everything in between. It also simplifies the interfaces to those caches, and does a bunch of really smart work behind the scenes.

The entire cache is implemented in the following class:

public class InMemoryNancyAuthenticationContextCache : INancyAuthenticationContextCache
{
    private readonly ICacheManager<AuthenticationContext> _manager;

    public InMemoryNancyAuthenticationContextCache()
        : this(TimeSpan.FromHours(1))
    {

    }

    public InMemoryNancyAuthenticationContextCache(TimeSpan cachedItemsValidFor)
    {
        _manager = CacheFactory.Build<AuthenticationContext>(a => a.WithSystemRuntimeCacheHandle().WithExpiration(ExpirationMode.Absolute, cachedItemsValidFor));
    }

    public AuthenticationContext Get(string token)
    {
        return _manager.Get(token);
    }

    public void Insert(string token, AuthenticationContext authContext)
    {
        _manager.Put(token, authContext);
    }
}

Our common authentication library now checks whatever cache was injected into it before going off to the real auth service to validate tokens. I made sure to write a few tests to validate the caching behaviour (like expiration and eviction, validating a decrease in the number of calls to the auth provider when flooded with the same token and so on), and everything seems to be good.

The one downside of using CacheManager (and the System.Runtime.Caching libraries) is that I’m putting a lot of trust in everything to “just work”. Performance testing will prove out whether or not there are any issues, but I can guarantee that if there are they will be a massive pain to diagnose if anything weird happens, just because our code is so many levels removed from the actual caching.

I sure do hope it does the right thing under stress.

Summary

Like I said at the start, I always hold off on implementing a cache for as long as I possibly can. Caching is a powerful tool to improve performance, but it can lead to some frustratingly hard to debug situations because of the transient nature of the information. No longer can you assume that you will be getting the most accurate information, which can make it harder to reason about execution paths and return values after something has actually happened. Of course, any decent system (databases, browsers, even disk access) usually has caching implemented to some degree, so I suppose you’re always dealing with the problem regardless of what your code actually does.

Good caching is completely invisible, feeding consumers appropriate information and mitigating performance problems without ever really showing its ugly side.

When it comes to implementing caching, it doesn’t make sense to write it yourself (which is true for a lot of things). Accept the fact that many people a lot smarter than you have already probably already solved the problem and just use their work instead. The existence of CacheManager was an unexpected bonus to loosely coupling our code to a specific cache implementation, which was nice.

Now if only someone could solve the whole naming things problem.

0 Comments

Accidentally breaking backwards compatibility is an annoying class of problem that can be particularly painful to deal with once it occurs, especially if it gets out into production before you realise.

From my experience, the main issue with this class of problem is that its unlikely to be tested for, increasing the risk that when it does occur, you don’t notice until its dangerously late.

Most of the time, you’ll write tests as you add features to a piece of software, like an API. Up until the point where you release, you don’t really need to think too hard about breaking changes, you’ll just do what is necessary to make sure that your feature is working right now and then move on. You’ll probably implement tests for it, but they will be aimed at making sure its functioning.

The moment you release though, whether it be internally or externally, you’ve agreed to maintain whatever contract you’ve set in place, and changing this contract can have disastrous consequences.

Even if you didn’t mean to.

Looking Backwards To Go Forwards

In my case, I was working with that service that uses RavenDB, upgrading it to have a few new endpoints to cleanup orphaned and abandoned data.

During development I noticed that someone had duplicated model classes from our models library inside the API itself, apparently so they could decorate them with attributes describing how to customise their serialisation into RavenDB.

As is normally the case with this sort of thing, those classes had already started to diverge, and I wanted to nip that problem in the bud before it got any worse.

Using attributes on classes as the means to define behaviour is one of those things that sounds like a good idea at the time, but quickly gets troublesome as soon as you want to apply them to classes that are in another library or that you don’t control. EF is particularly bad in this case, featuring attributes that define information about how the resulting data schema will be constructed, sometimes only relevant to one particular database provider. I much prefer an approach where classes stand on their own and there is something else that provides the meta information normally provided by attributes. The downside of this of course is that you can no longer see everything to do with the class in one place, which is a nicety that the attribute approach provides. I feel like the approach that allows for a solid separation of concerns is much more effective though, even taking into account the increased cognitive load.

I consolidated some of the duplicate classes (can’t fix everything all at once) and ensured that the custom serialisation logic was maintained without using attributes with this snippet of code that initializes the RavenDB document store.

public class RavenDbInitializer
{
    public void Initialize(IDocumentStore store)
    {
        var typesWithCustomSerialization = new[]
        {
            typeof(CustomStruct),
            typeof(CustomStruct?),
            typeof(CustomStringEnumerable), 
            typeof(OtherCustomStringEnumerable)
        };

        store.Conventions.CustomizeJsonSerializer = a => a.Converters.Add(new RavenObjectAsJsonPropertyValue(typesWithCustomSerialization));
        store.ExecuteIndex(new Custom_Index());
    }
}

The RavenObjectAsJsonPropertyValue class allows for the custom serialization of a type as a pure string, assuming that the type supplies a Parse method and a ToString that accurately represents the class. This reduces the complexity of the resulting JSON document and avoids mindlessly writing out all of the types fields and properties (which we may not need if there is a more efficient representation).

After making all of the appropriate changes I ran our suite of unit, integration and functional tests. They all passed with flying colours, so I thought that everything had gone well.

I finished up the changes relating to the new cleanup endpoint (adding appropriate tests), had the code reviewed and then pushed it into master, which pushed it through our CI process and deployed it to our Staging environment, reading for a final test in a production-like environment.

Then the failures began.

Failure Is Not An Option

Both the new endpoint and pre-existing endpoints were failing to execute correctly.

Closer inspection revealed that the failures were occurring whenever a particular object was being read from the underlying RavenDB database and that the errors occurring were related to being unable to deserialize the JSON representation of the object.

As I still remembered making the changes to the custom serialization, it seemed pretty likely that that was the root cause.

It turns out that when I consolidated the duplicate classes, I had forgotten to actually delete the now useless duplicated classes. As a result of not deleting them, a few references to the deprecated objects had sneakily remained, and because I had manually specified which classes should have custom serialization (which were the non-duplicate model classes), the left over classes were just being serialized in the default way (which just dumps all the properties of the object).

But all the tests passed?

The tests passed because I had not actually broken anything from a self contained round trip point of view. Any data generated by the API was able to be read by the same API.

Data generated by previous versions of the API was no longer valid, which was unintentional.

Once I realise my mistake, the fix was easy enough to conceive. Delete the deprecated classes, fix the resulting compile errors and everything would be fine.

Whenever I come across a problem like this though I prefer to put measures in place such that the exact same thing never happens again.

In this case, I needed a test to verify that the currently accepted JSON representation of the entity continued to be able to be deserialized from a RavenDB document store.

With a failing test I could be safe in the knowledge that when it passed I had successfully fixed the problem, and I can have some assurance that the problem will likely not reoccur in the future if someone breaks something in a similar way (and honestly, its probably going to be me).

Testing Is An Option Though

Being that the failure was occurring at the RavenDB level, when previously stored data was being retrieved, I had to find a way to make sure that the old data was accessible during the test. I didn’t want to have to setup and maintain a complete RavenDB instance that just had old data in it, but I couldn’t just use the current models to add data to the database in the normal way, because that would not actually detect the problem that I experienced.

RavenDB is pretty good as far as testing goes though. Its trivially easy to create an embedded data store and while the default way of communicating with the store is to use classes directly (i.e. Store<T>), it gives access to all of the raw database commands as well.

The solution? Create a temporary data store, use the raw database commands to insert a JSON document of the old format into a specific key then use the normal RavenDB session to query the entity that was just inserted via the generic methods.

[Test]
[Category("integration")]
public void WhenAnEntityFromAPreviousVersionOfOfTheApplicationIsInsertedIntoTheRavenDatabaseDirectly_ThatEntityCanBeReadSuccessfully()
{
    using (var resolverFactory = new IntegrationTestResolverFactory())
    {
        var assemblyLocation = Assembly.GetAssembly(this.GetType()).Location;
        var assemblyFolder = Path.GetDirectoryName(assemblyLocation);

        var EntitySamplePath = Path.Combine(assemblyFolder, @"Data\Entity.json");
        var EntityMetadataPath = Path.Combine(assemblyFolder, @"Data\Entity-Metadata.json");

        var Entity = File.ReadAllText(EntitySamplePath);
        var EntityMetadata = File.ReadAllText(EntityMetadataPath);

        var key = "Entity/1";

        var store = resolverFactory.Application().Get<IDocumentStore>();
        store.DatabaseCommands.Put(key, null, RavenJObject.Parse(Entity), RavenJObject.Parse(EntityMetadata));

        using (var session = store.OpenSession())
        {
            var entities = session.Query<Entity>().ToArray();
            entities.Length.Should().Be(1);
        }
    }
}

The test failed, I fixed the code, the test passed and everyone was happy. Mostly me.

Conclusion

I try to make sure that whenever I’m making changes to something I take backwards compatibility into account. This is somewhat challenging when you’re working at the API level and thinking about models, but is downright difficult when breaking changes can occur during things like serialization at the database level. We had a full suite of tests, but they all worked on the current state of the code and didn’t take into account the chronological nature of persisted data.

In this particular case, we caught the problem before it became a real issue thanks to our staging environment accumulating data over time and our excellent tester, but that’s pretty late in the game to be catching these sorts of things.

In the future it may be useful to create a class of tests specifically for backwards compatibility shortly after we do a release, where we take a copy of any persisted data and ensure that the application continues to work with that structure moving into the future.

Honestly though, because of the disconnected nature of tests like that, it will be very hard to make sure that they are done consistently.

Which worries me.

0 Comments

Like any normal, relatively competent group of software developers, we write libraries to enable reuse, mostly so we don’t have to write the same code a thousand times. Typically each library is small and relatively self contained, offering only a few pieces of highly cohesive functionality that can be easily imported into something with little to no effort. Of course, we don’t always get this right, and there are definitely a few “Utility” or “Common” packages hanging around, but we do our best.

Once you start building and working with libraries in .NET, the obvious choice for package management is Nuget. Our packages don’t tend to be released to the greater public, so we don’t publish to Nuget.org directly, we just use the built-in Nuget feed supplied by TeamCity.

Well we did, up until recently, when we discovered that the version of TeamCity we were using did not support the latest Nuget API and Visual Studio 2015 didn’t like the old version of the Nuget feed API. This breaking change was opportune, because while it was functional, the TeamCity Nuget feed had been bugging us for a while. Its difficult to add packages that weren’t created by a Buidl Configuration and packages are intrinsically tied to the Artefacts of a Build Configuration, so they can be accidentally deleted when cleaning up. So we decided to move to an actual Nuget server, and we chose Myget because we didn’t feel like rolling our own.

That preface should set the stage for the main content of this post, which is about dealing with the difficulties in debugging when a good chunk of your code is in a library that was prepared earlier.

Step In

To me, debugging is a core part of programming.

Sure you can write tests to verify functionality and you can view your resulting artefact(webpage, application, whatever), but at some pointits incredibly helpful to be able to step through your code, line by line, in order to get to the bottom of something (usually a bug).

I’m highly suspicious of languages and development environments that don’t offer an easy way to debug, and of developers that rely on interpreting log statements to understand their code. I imagine this is probably because of a history of using Visual Studio, which has had all of these features forever, but trying to do anything but the most basic actions without a good IDE is hellish. Its one of the many reasons why I hate trying to debug anything in a purely command line driven environment (like some flavours of Linux or headless Windows).

I spend most of my time dealing with two languages, C# and Powershell, which are both relatively easy to get a good debugging experience. Powershell ISE is functional (though not perfect) and Visual Studio is ridiculously feature rich when it comes to debugging.

To tie all of this back in with the introduction though, debugging into C# libraries can be somewhat challenging.

In order to debug C# code you need debugging information in the form of PDB files. These files either need to be generated at compile time, or they need to be generated from the decompiled DLL before debugging. PDB files give the IDE all of the information it needs in order to identify the constituent components of the code, including some information about the source files the code was compiled from. What these files don’t give, however, is the actual source code itself. So while you might have the knowledge that you are in X function, you have no idea what the code inside that function is, nor what any of the variables in play are.

Packaging the PDB files along with the compiled binaries is easy enough using Nuget, assuming you have the option enabled to generate them for your selected Build Configuration.

Packaging source files is also pretty easy.

The Nuget executable provides a –symbols parameter that will automatically create two packages, the normal one and another one containing all of the necessary debug information + all of your source files as of the time of compile.

What do you do with it though?

Source Of All Evil

Typically, if you were uploading your package to Nuget.org, the Nuget executable would automatically upload the symbols package to SymbolSource. SymbolSource is a service that can be configured as a symbol server inside Visual Studio, allowing the application to automatically download debugging information as necessary (including source files).

In our case, we were just including the package as an artefact inside TeamCity, which automatically includes it in the TeamCity Nuget feed. No symbol server was involved.

In fact, I have a vague memory of trying to include both symbol and non-symbol packages and having it not work at all.

At the time, the hacky solution was put in place to only include the symbols package in the artefacts for the Build Configuration in TeamCity, but to rename it to not include the word symbols in the filename. This kept TeamCity happy enough, and it allowed us to debug into our libraries when we needed to (the PDB files would be automatically loaded and it was a simple matter to select the source file inside the appropriate packages directory whenever the IDE asked for them).

This worked up until the point where we switched to Myget.

Packages restored from Myget no longer had the source files inside them, leaving us in a position where our previous approach no longer worked. Honestly, our previous approach was a little hacky, so its probably for the best that it stopped working.

Sourcery

It turns out that Myget is a bit smarter than TeamCity when it comes to being a Nuget feed.

It expects you to upload both the normal package and the symbols package, and then it deals with all the fun stuff on the server. For example, it will automatically interpret and edit the PDB files to say that the source can be obtained from the Myget server, rather than from wherever the files were when it was compiled and packaged.

Our hacky solution to get Symbols and Source working when using TeamCity was getting in the way of a real product.

Luckily, this was an easy enough problem to solve while also still maintaining backwards compatibility.

Undo the hack that enforced the generation of a single output from Nuget (the renamed symbols package), upload both files correctly to Myget and then upload just a renamed symbols package to TeamCity separately.

Now anything that is still using TeamCity will keep working in the same way as it always was (manual location of source files as necessary) and all of the projects that have been upgraded to use our new Myget feed will automatically download both symbols and source files whenever its appropriate (assuming you have the correct symbol server setup in Visual Studio).

Conclusion

Being able to debug into the source files of any of your packages automatically is extremely useful, but more importantly, it provides a seamless experience to the developer, preventing the waste of time and focus that occurs when flow gets interrupted in order to deal with something somewhat orthogonal to the original task.

The other lesson learned (which I should honestly have engraved somewhere highly visible by now) is that hacked solutions always have side effects that you can’t foresee. Its one of those unfortunate facts of being a software developer.

From a best practices point of view, its almost always better to fully understand the situation before applying any sort of fix, but you have to know how to trade that off against speed. We could have chosen to implement a proper Nuget server when we first ran across the issue with debugging our libraries, rather than hacking together a solution to make TeamCity work approximately like we wanted, but we would have paid some sort of opportunity cost in order to do that.

At the least, what we should have done was treat symbols package rename for TeamCity in a isolated fashion, so the build still output the correct packages.

This would have made it obvious that something special was happening for TeamCity and for TeamCity only, making it easier to work with a completely compliant system at a later date.

I suppose the lesson is, if you’re going to put a hack into place, make sure its a well constructed hack.

Which seems like a bit of an oxymoron.