Comma, Comma, Comma Chameleon

April 24. 2018 0 Comments

In previous posts, I’ve briefly mentioned how one of the most valuable outcomes of the data synchronization algorithm was the capability to gather meaningful business intelligence about our users. This includes information like how often they use the software, what parts they are using and how many of its entities they are creating (and deleting) during their normal daily process.

When used responsibly, this sort of intelligence is invaluable in helping to create a better, more valuable experience for everyone involved. Literally every part of your standard organization (support, sales, development, marketing, finance) can make use of the data in some way.

Of course, when its all sequestered inside a PostgreSQL database, the barrier to entry can be quite high. Generally this just means that the people who are best positioned to make use of the data either don’t get access to it (a waste) or they end up constantly bugging the members of the small group of people with the capability to do the analysis.

When I’m a member of that small group, I get suddenly motivated to find an alternate way to give them what they need.

Re-education Campaign

There are really two kinds of analysis that tend to happen for business intelligence purposes.

The first is answering ad-hoc questions. Typically these sorts of questions have not been asked before, or they could be a slight variation on the same theme as a previous question.

Regardless, these generally require someone who actually knows the data and has the skills to wrangle it in order to get to the meat of the issue and provide the answer. This can be taught (with some effort), but not everyone has the time to dedicate to that sort of thing, so new analysis will probably always have to be performed by trained professionals.

The second type of analysis is re-answering a previously answered question, now that time has passed.

This tends to occur quite frequently, especially when it comes to metrics (like “how many customers have X, how many Y does each customer have, etc), and it is here that there is an opportunity to optimise.

In the long term, there should be an easy platform for people to get at the information they want, without having to understand how to run SQL scripts (or Powershell). This is likely going to come in the form of Metabase, but unfortunately this is probably still a little ways off in the future, and people are asking for data now.

So in the short term, some way to easily run pre-canned queries and get the results as a CSV is desirable.

Copy That

The good news is that PostgreSQL makes it pretty easy to run a query and get CSV output using the COPY command:

COPY ({query}) TO STDOUT WITH DELIMITER ',' CSV HEADER

Executing that command will result in a stream containing the data specified in the query (along with headers), in CSV format.

Of course, its kind of useless without a mechanism to easily run it, so if you’re using .NET, you can leverage Npgsql to actually execute the command using the BeginTextExport function on an NpgsqlConnection, like in the following class:

using System.IO;

namespace Solavirum.Exporting
{
    public class PostgresTableStreamer
    {
        private readonly INpgsqlConnectionFactory _npgsqlConnectionFactory;

        public PostgresTableStreamer(INpgsqlConnectionFactory npgsqlConnectionFactory)
        {
            _npgsqlConnectionFactory = npgsqlConnectionFactory;
        }

        public long Stream(string copySql, StreamWriter output)
        {
            var numberOfLinesInFile = 0;
            using (var conn = _npgsqlConnectionFactory.Create())
            {
                conn.Open();
                using (var reader = conn.BeginTextExport(copySql))
                {
                    var i = reader.Read();
                    var insideQuotedField = false;

                    while (i != -1)
                    {
                        var c = (char) i;
                        if (c == '"') insideQuotedField = !insideQuotedField;
                        switch (c)
                        {
                            case '\r':
                                i = reader.Read();
                                continue;
                            case '\n':
                                if (insideQuotedField) output.Write("\\n");
                                else
                                {
                                    output.Write(c);
                                    numberOfLinesInFile++;
                                }
                                break;
                            default:
                                output.Write(c);
                                break;
                        }
                        i = reader.Read();
                    }
                    output.Flush();
                }
                conn.Close();
            }
            return numberOfLinesInFile;
        }
    }
}

The only tricksy thing is escaping new line characters so that they don’t explode the formatting of the resulting file.

A Model View

Of course, a C# class is just as useless to a non-technical person as an SQL query, so now we have to think about delivery.

The easiest way to give people access to some piece of functionality is typically a website, so that seems like a good approach.

My skills are somewhat limited when it comes to putting together a website in a hurry. Our previous websites have been React.JS, but they need a bit of a pipeline to get up and running and I didn’t want to invest that much in this prototype.

Eventually, I settled on an ASP.NET CORE MVC website, deployed manually to a single Linux instance in AWS.

Nothing too complicated, just a single controller that lists the queries that are available (which are deployed with the code, no custom queries here), and then a way to run a single query and get the result as a file.

Getting ASP.NET CORE MVC up and running on a Linux box is pretty straightforward, assuming you use a local Nginx instance as a reverse proxy (which is easy).

As always, because we’re not stupid, a little touch of security is also necessary, so:

The box is behind a load balancer, to allow for easy HTTPS.
The load balancer is only accessible from the static IP at our main office. If you’re out of the office you can still use the service, but you need to be on the VPN, which is required for a bunch of our other things anyway.

If the website lives for long enough, I’ll probably augment it with Active Directory logins, so people can just use their domain credentials, but short term, its secure enough.

Conclusion

That’s basically it.

Start to finish, it all took about 4 hours, and while I wouldn’t say it was my best work, Its good enough to solve the immediate problem. At least until we get Metabase up and running and we throw the whole thing out anyway.

I’ll probably clean it up a bit (the manual deployment rankled me more than I wanted to admit) and then wash my hands of the whole thing.

More importantly, the amount of people asking me to “please run query X and email me the results” has dropped dramatically, so I’ll consider this a success.

Now I have more time for the really important things.

Like participating in the Mario Kart Tournament.

AWS Lambda and .NET Core, Two Great Tastes That Taste Great Together, Part 4

September 19. 2017 0 Comments

And thus we come to the end of this particular saga, of rewriting the our AWS Lambda ELB Logs Processor in C# using .NET Core.

To summarise the posts in this series:

In conjunction with those posts, I’ve also uploaded a copy of the entire source code (build and deployment scripts included) to Solavirum.Logging.ELB.Processor.Lambda, mostly as a master reference point. Its a copy of our internal repository, quickly sanitized to remove anything specific to our organization, so I can’t guarantee that it works perfectly, but there should be enough stuff there to demonstrate the entire process working end to end.

As with all the rest of our stuff, the build script is meant to run from inside TeamCity, and the deployment script is meant to run from inside Octopus Deploy, so if you don’t have either of those things, you will have to use your imagination.

Of course, everything I’ve gone through is purely academic unless its running successfully in Production…

Seize The Means Of Production

If you’ve read any of my previous posts about the Javascript/Node.js ELB Logs Processor, you’re probably aware of the fact that it tended to look like it was working fine right up to the point where I published it into production and the sheer weight of traffic caused it to have a meltdown.

Unsurprisingly, the difference between a sample file with a few hundred lines in it and a real file with hundreds of thousands of lines is significant, and that’s not even really taking into account the difference between my local development environment and AWS Lambda.

In an unprecedented result though, the .NET Core implementation worked just fine when I shipped it to our production environment.

Unfortunately it was around twenty times slower.

The old Javascript/Node.js implementation could process a file in around 9 seconds, which is pretty decent.

The .NET Core implementation processes a file in around 200 seconds, which is a hell of a lot slower.

Now, I can explain some of that away with the fact that the .NET Core implementation uses HTTP instead of TCP to communicate with Logstash (which was the primary reason I rewrote it in the first place), but it was still pretty jarring to see.

Interestingly enough, because executing Lambda functions is so cheap, the total cost increase was like $US8 a month, even though it was twenty times slower. so who really cares.

In the end, while the increase in processing time was undesirable, I’d rather have slower C# and HTTP over faster Javascript and TCP.

Summary

Overall, I’m pretty happy with how this whole .NET Core rewrite went, for a number of reasons:

I got to explore and experiment with the newest .NET and C# framework, while delivering actual business value, which was great.
In the process of developing this particular solution, I managed to put together some more reusable scripts and components for building other .NET Core components
I finally switched the ELB Logs Processor to HTTP, which means that the overall feed of events into our ELK stack is more reliable (TCP was prone to failure)
I managed to get a handle on fully automated testing of components that require AWS Credentials, without accidentally checking the credentials into source control

In the end though, I’m mostly just happy because its not Javascript.

AWS Lambda and .NET Core, Two Great Tastes That Taste Great Together, Part 3

September 12. 2017 0 Comments

Last week I wrote a high level overview of the build, package, test and deploy process that we created in order to get .NET Core C# into AWS Lambda.

As part of that post, I briefly touched on the validation part of the proces, which basically amounted to “we wrote some tests in C# using XUnit, then used the dotnet command line interface to run them as part of the build”.

Considering that the testing took a decent amount of the total time, that amount of coverage isn’t really appropriate, so this post will elaborate further, and explain some of the challenges we encountered and how we overcame them.

I’m A Little Testy

At a high level, testing a .NET Core AWS Lambda function written in C# is not really any more difficult than testing other C# code. Like the stuff I’ve been writing for years.

Unit test classes in isolation, integration test classes together, have a few end to end tests (if possible) and you’re probably in a good place.

When you write a lambda function, all you really need is a single class with a method on it (Handler::Handler or equivalent). You could, if you wanted, put literally all of your logic into this class/method combination and have a fully functional lambda function.

That would probably make it harder to test though.

A better idea is to break it down into its constituent parts, ideally taking the form of small classes with a single responsibility, working together to accomplish a greater goal.

For the ELB Logs Processor, there are three clear responsibilities:

Read a file from S3, and expose the information therein as lines of text
Take a line of text and transform it into some sort of structure, potentially breaking it down and adding meta information
Take a structure and output it to a Logstash endpoint

Technically there is also a fourth responsibility which is to tie the three things above together (i.e. orchestration), but we’ll get to that in a minute.

Anyway, if the responsibilities are clearly separated, we’re in unit testing territory, which is familiar enough that I don’t really want to talk about it too much. Grab your testing framework of choice (XUnit is a good pick for .NET Core, NUnit didn’t seem to work too well in Visual Studio + Resharper), mock out your dependencies and test away.

You can apply the same sort of testing to the orchestrator (the Handler) by giving it its dependencies through its constructor. Make sure you give it a parameterless constructor with default values though, or you’ll have issues running it in AWS.

This is basically just a poor mans dependency injection, but my favour dependency injection library, Ninject, does not exist in .NET Core and I didn’t want to invest the time to learn one of the other ones (like AutoFac) for such a small project.

Of course, while unit tests are valuabl, they don’t really tell us whether or not everything will work when you put it all together.

That’s where integration tests come in.

Integration Functions

In my experience, I’ve consistently found there to be two distinct flavours of integration test.

The first is the test that validates whether or not everything works when all of your things are put together. This tends to end up being a test of your dependency injection configuration (if you’re using it), where you create some top level object using your container of choice, then execute a series of methods on it (and its children) echoing what would normally happen in your program execution. You might mock out the very edges of the system (like databases, API dependencies, disk-based configuration, etc), but in general you want to validate that your tens (hundreds? thousands?) of classes play nice together and actually accomplish their overarching goal.

The second type of test is the one that validates whether or not a class that relies on an external system works as expected. These are very close to unit tests in appearance (single class, some mocked out dependencies), but are much slower and more fragile (due to the reliance on something outside of the current code base). They do have a lot of value though, because beyond application failures, they inform you when your integration with the external dependency breaks for whatever reason.

The ELB Logs Processor features both types of integration test, which is nice, because it makes for a comprehensive blog post.

The simpler integration test is the one for the component that integrates with S3 to download a file and expose it as a series of lines. All you need to do is put a file in a real S3 bucket, then run the class with an appropriate set of parameters to find that file, download it and expose it in the format that is desired. You’re free to simulate missing files (i.e. go get this file that doesn’t exist) and other error cases as desired as well, which is good.

The other integration test involves the Handler, using its default configuration and a realistic input. The idea would be to run the Handler in a way that is as close to how it will be run in reality as possible, and validate that it does what it advertises. This means we need to put some sort of sample file in S3, fire an event at the Handler describing that file (i.e. the same event that it would receive during a normal trigger in AWS), validate that the Handler ran as expected (perhaps measuring log output or something) and then validate that the events from the sample file were available in the destination ELK stack. A complicated test for sure, but one with lots of value.

As is always the case, while both of those tests are conceptually fairly simple and easy to spell out, they both have to surmount the same challenge.

AWS Authentication and Authorization.

401, Who Are You?

The various libraries supplied by AWS (Javascript, Powershell, .NET, Python, etc) all handle credential management in a similar way.

They use a sort of priority system for determining the source of their credentials, and it looks something like this:

If credentials have been supplied directly in the code (i.e. key/secret), use those
Else, if a profile has been supplied, look for that and use it
Else, if no profile has been supplied, use the default profile
Else, if environment variables have been set, use those
Else, use information extracted from the local machine (i.e. IAM profiles or EC2 meta information)

The specific flow differs from language to language (.NET specific rules can be found here for example), but the general flow is pretty consistent.

What does that mean for our tests?

Well, the actual Lambda function code doesn’t need credentials at all, because the easiest way to run it is with an IAM profile applied to the Lambda function itself in AWS. Assuming no other credentials are available, this kicks in at option 5 in the list above, and is by far the least painful (and most secure) option to use.

When running the tests during development, or running them on the build server, there is no ambient IAM profile available, so we need to find some other way to get credentials to the code.

Luckily, the .NET AWS library allows you to override the logic that I described above, which is a pretty great (even though it does it with static state, which makes me sad).

With a little work we can nominate an AWS profile with a specific name to be our current “context”, and then use those credentials to both setup a temporary S3 bucket with a file in it, and to run the code being tested, like in the following Handler test.

/// <summary>
/// This test is reliant on an AWS profile being setup with the name "elb-processor-s3-test" that has permissions
/// to create and delete S3 buckets with prefixes oth.console.elb.processor.tests. and read items in those buckets
/// </summary>
[Fact]
public void WhenHandlingAValidS3Event_ConnectsToS3ToDownloadTheFile_AndOutputsEventsToLogstash()
{
    var inMemoryLogger = new InMemoryLogger();
    var xunitLogger = new XUnitLogger(_output);
    var logger = new CompositeLogger(inMemoryLogger, xunitLogger);

    var application = Guid.NewGuid().ToString();
    var config = new Dictionary<string, string>
    {
        { Configuration.Keys.Event_Environment, Guid.NewGuid().ToString() },
        { Configuration.Keys.Event_Application, application},
        { Configuration.Keys.Event_Component, Guid.NewGuid().ToString() },
        { Configuration.Keys.Logstash_Url, "{logstash-url}" }
    };
    using (AWSProfileContext.New("elb-processor-s3-test"))
    {
        logger.Log("Creating test bucket and uploading test file");
        var s3 = new AmazonS3Client(new AmazonS3Config { RegionEndpoint = RegionEndpoint.APSoutheast2 });
        var testBucketManager = new TestS3BucketManager(s3, "oth.console.elb.processor.tests.");
        using (var bucket = testBucketManager.Make())
        {
            var templateFile = Path.Combine(ApplicationEnvironment.ApplicationBasePath, @"Helpers\Data\large-sample.log");
            var altered = File.ReadAllText(templateFile).Replace("@@TIMESTAMP", DateTime.UtcNow.ToString("yyyy-MM-ddTHH:mm:ss.ffffffZ"));
            var stream = new MemoryStream(Encoding.UTF8.GetBytes(altered));
            var putResult = s3.PutObjectAsync(new PutObjectRequest { BucketName = bucket.Name, InputStream = stream, Key = "test-file" }).Result;

            var message = new S3EventBuilder().ForS3Object(bucket.Name, "test-file").Build();

            logger.Log("Executing ELB logs processor");
            var handler = new Handler(logger, config);
            handler.Handle(message);
        }
    }

    // Check that there are some events in Elasticsearch with our Application
    Func<long> query = () =>
    {
        var client = new HttpClient {BaseAddress = new Uri("{elasticsearch-url}")};
        var raw = client.GetStringAsync($"/logstash-*/_search?q=Application:{application}").Result;
        dynamic result = JsonConvert.DeserializeObject(raw);
        return (long) result.hits.total;
    };

    query.ResultShouldEventuallyBe(hits => hits.Should().BeGreaterThan(0, $"because there should be some documents in Elasticsearch with Application:{application}"));
}

The code for the AWSProfileContext class is relatively simple, in that while it is in effect it inserts the profile into the first position of the credential fallback process, so that it takes priority over everything else. When disposed, it removes that override, returning everything to the way it was. A similar dispose-delete pattern is used for the temporary S3 bucket.

With a profile override in place, you can run the tests in local development just by ensuring you have a profile available with the correct name, which allows for a nice smooth development experience once a small amount of manual setup has been completed.

To run the tests on the build servers, we put together a simple helper in Powershell that creates an AWS profile, runs a script block and then removes the profile. The sensitive parts of this process (i.e. the credentials themselves) are stored in TeamCity, and supplied when the script is run, so nothing dangerous needs to go into source control at all.

To Be Continued

The most complicated part of testing the ELB Logs Processor was getting the AWS credentials working consistently across a few different execution strategies, while not compromising the safety of the credentials themselves. The rest of the testing is pretty just vanilla C# testing.

The only other thing worth mentioning is that the AWS libraries available for .NET Core aren’t quite the same as the normal .NET libraries, so some bits and pieces didn’t work as expected. The biggest issue I ran into was the how a lot of the functionality for managing credential is typically configured through the app.config, and that sort of thing doesn’t even seem to exist in .NET Core. Regardless, we still got it work in the end, even though we did have to use the obsolete StoredProfileAWSCredentials class.

Next week will be the last post in this series, drawing everything together and focusing on actually running the new .NET Core ELB Logs Processor in a production environment.

AWS Lambda and .NET Core, Two Great Tastes That Taste Great Together, Part 2

September 5. 2017 0 Comments

So, last week was mostly an introduction to .NET Core and AWS Lambda. Not a lot of meat.

This week I’ll be writing about the build and deployment pipeline that we set up, and how it ties in with our normal process.

Hopefully it will be a bit more nutritious.

The Story So Far

Over the last few years we’ve put together a fairly generic build and deployment process using Powershell that focuses on ensuring the process can be run painlessly on a fresh checkout of the repository. The goal is to have everything either embedded or bootstrapped, and to have any and all logic encapsulated within the scripts.

What this means is that we have a relatively simple setup in TeamCity, which is basically just parameter management and a small bit of logic for kicking off the scripts that live inside the repositories.

The same approach is extended to deployment as well; we have Powershell (or an equivalent scripting language) that does the actual deployment, and then only use Octopus Deploy for orchestration and variable management.

Because we need to build, test and deploy a variety of things (like C# desktop applications, Javscript websites, the old Javascript ELB Logs Processor, CloudFormation environments), the common scripts feature pluggable processes in the form of Powershell script blocks, allowing the user to choose what sort of build is needed (simple Nuget package over some files? A full MSBuild execution that generates an .exe in a Nuget package?), how and when it is tested (Nunit over .dlls containing the phrase “Tests” before deployment? Selenium using Webdriver IO through npm after deployment?) and how it is deployed (pretty much just Octopus Deploy).

The API for the pluggable processes (which I’m going to call engines from this point forward) is relatively lax, in that the Powershell just calls the supplied script block with known parameters, and fails miserably if they don’t exit.

In retrospect, we probably should have just used something already available, like psake, but it just didn’t seem necessary at the time. We’re probably going to get hurt if we try to do anything that only works on Linux systems (no Powershell there), but it hasn’t happened yet, so I’m sure it will never happen.

To summarise, we have a framework that allows for configurable approaches to build, test and deployment, and a .NET Core AWS Lambda function should be able to slot right in.

Vrmm Vrmm Goes The Engine

In order to reliably get the lambda function code from the git repository in which it lives, to the actual AWS Lambda function where it will be executed, we needed to do three things:

Creation of an artefact containing a versioned and validated copy of the code to be executed
The validation logic itself (i.e. testing)
Deploying said artefact to the AWS Lambda function

Coincidentally, these three things line up nicely with the pluggable components in our generic build and deployment process, so all we have to do is provide an engine for each, then run the overall process, and everything should be sweet.

Bob The Builder

The first thing to deal with is the actual creation of something that can be deployed, which for us means a Nuget package.

Referring back to the Javascript version of the ELB Logs Processor, there really wasn’t a lot to the creation of the artefact. Use npm to make sure the appropriate dependencies are present, a quick consolidation step with webpack, then wrap everything up in a Nuget package.

For the C# version, its a similar story, except instead of using node/npm, we can use the dotnet executable available in the SDK, followed by the Nuget command line tool to wrap it all up neatly.

Basically, it looks like this:

dotnet restore {solution-path}
dotnet build –c Release {solution-path}
dotnet publish –o {output-path} –c Release {solution-path}
nuget pack {nuspec-path} {output-directory} {version}

The actual script block for the engine is a little bit more complicated than that (TeamCity integration, bootstrapping the .NET Core SDK, error handling/checking, etc), but its still relatively straightforward.

As I mentioned last week, the only thing to watch out for here is to make sure that you’re targeting the appropriate runtime with your solution configuration, and not to try and build a standalone executable. The code will actually be executed on a Linux machine behind the scenes, and AWS Lambda will make sure the .NET Core runtime is available, so all you have to worry about is your code. I’m pretty sure if you try and supply an OS independent executable it will fail miserably.

Emotional Validation

The second thing to deal with is the validation of the artefact.

Similar to the build process, if we refer back to the Javascript ELB Logs Processor, we had no testing at all. Literally nothing. Our validation was “deploy it and see if the logs stop”.

Don’t get me wrong, there are tonnes of ways to write tests and automate them when using node/npm, we just never went to the effort of actually doing it.

I’m much more familiar with C# (even taking into account the differences of .NET Core), and we don’t write code without tests, so this was an ideal opportunity to put some stuff together for validation purposes.

I’m going to dedicate an entire post to just how we went about testing the rewrite (some interesting stuff there about managing AWS credentials in a secure way), but at a high level we followed our standard pattern of having dedicated test projects that sit in parallel to the code (i.e. Lambda.Tests.Unit, Lambda.Tests.Integration) with automatic execution through the generic build process via an engine.

This particular engine is applied predeployment (i.e. before the Nuget package is pushed to Octopus and a project executed), and is pretty straight forward:

dotnet test {project-path} –c Release –no-build –logger {output-file-path}

The only caveat here was the general immaturity of .NET Core, as I had some problems using NUnit (which is the testing framework we typically use for C#), but XUnit worked perfectly fine. They aren’t that dissimilar, so it wasn’t a huge problem.

Tentacle Joke

The last thing to deal with is the actual deployment of the artefact to AWS.

Assuming step 1 and 2 have gone well (i.e. we have a versioned Nuget package that has been validated), all that’s left is to leverage Octopus Deploy.

For us this means a relatively simple project inside Octopus that just deploys the Nuget package to a worker tentacle, and a Powershell script embedded in the package that does the bulk of the work.

$awsKey = Get-OctopusParameter "AWS.Deployment.Key";
$awsSecret = Get-OctopusParameter "AWS.Deployment.Secret";
$awsRegion = Get-OctopusParameter "AWS.Deployment.Region";

$version = Get-OctopusParameter "Octopus.Release.Number";

$functionName = Get-OctopusParameter "AWS.Lambda.Function.Name";

$aws = Get-AwsCliExecutablePath

$env:AWS_ACCESS_KEY_ID = $awsKey
$env:AWS_SECRET_ACCESS_KEY = $awsSecret
$env:AWS_DEFAULT_REGION = $awsRegion

$functionPath = "$here\function"

Write-Verbose "Compressing lambda code file"
Add-Type -AssemblyName System.IO.Compression.FileSystem
[system.io.compression.zipfile]::CreateFromDirectory($functionPath, "index.zip")

Write-Verbose "Updating Log Processor lambda function [$functionName] to version [$version]"
(& $aws lambda update-function-configuration --function-name $functionName --runtime "dotnetcore1.0" --handler "Lambda::Handler::Handle") | Write-Verbose
(& $aws lambda update-function-code --function-name $functionName --zip-file fileb://index.zip) | Write-Verbose
(& $aws lambda update-function-configuration --function-name $functionName --description $version) | Write-Verbose

In summary, the script takes the contents of the Nuget package that was just deployed, zips up the relevant parts of it (in this case, everything under the “function” directory) and then uses the AWS CLI and some Octopus parameters to push the zip file to a known Lambda function.

Really the only interesting thing here is that we take the version from the Nuget package and apply it directly to the function in AWS for backwards traceability.

To Be Continued

The steps and processes that I’ve outlined above only describe the key parts of the build and deployment process. A lot of the detail is missing, and that is on purpose, because the entire thing is quite complicated and describing it comprehensively would make this blog post into more of a novel.

Its all ready pretty long.

The key things to take away are the .NET Core build and publish process (using the dotnet cmdline application), the .NET Core testing process (again, through the cmdline application) and the simplest way that I’ve found to actually push code to a Lambda function in an automated fashion.

Next week I’ll extrapolate on the testing of the lambda function, specifically around managing and using AWS credentials to provide an end to end test of function logic.

AWS Lambda and .NET Core, Two Great Tastes That Taste Great Together, Part 1

August 29. 2017 0 Comments

Over two months ago I commented that I was going to rewrite our ELB Logs Processor in C#. You know, a language that I actually like and respect.

No, it didn’t take me two months to actually do the rewrite (what a poor investment that would have been), I finished it up a while ago. I just kept forgetting to write about it.

Thus, here we are.

HTTP Is The Way For Me

To set the stage, lets go back a step.

The whole motivation for even touching the ELB Logs Processor again was to stop it pushing all of its log events to our Broker via TCP. The reasoning here was that TCP traffic is much harder to reason about, and Logstash has much better support around tracking, retrying and otherwise handling HTTP traffic than it does for TCP, so the switch makes sense as a way to improve the overall health of the system.

I managed to flip the Javascript ELB Logs Processor to HTTP using the Axios library (which is pretty sweet), but it all fell apart when I pushed it into production, and it failed miserably with what looked like memory issues.

Now, this is the point where a saner person might have just thrown enough memory at the Lambda function to get it to work and called it a day. Me? I’d had enough of Javascript at that point, so I started looking elsewhere.

At this point I was well aware that AWS Lambda offers execution of C# via .NET Core, as an alternative to Node.js.

Even taking into account my inexperience with doing anything at all using .NET Core, C# in Lambda was still infinitely preferable to Javascript. The assumption was that the ELB Logs Processor doesn’t actually do much (handle incoming S3 event, read file, send log events to Logstash), so rewriting it shouldn’t take too much effort.

Thus, here we are.

You Got .NET Core In My Lambda!

The easiest way to get started with using C# via .NET Core in AWS Lamnda, is to install the latest version of the AWS Toolkit and then just use Visual Studio. The toolkit provides a wealth of functionality for working with AWS in Visual Studio, including templates for creating Lambda functions as well as functionality for publishing your code to the place where it will actually be executed.

Actually setting up the Lambda function to run .NET Core is also relatively trivial. Assuming you’ve already setup a Lambda function with the appropriate listeners, its literally a combo box in the AWS Dashboard where you select the language/runtime version that your code requires.

Using the integrated toolkit and Visual Studio is a great way to get a handle on the general concepts in play and the components that you need to manage, but its not how you should do it in a serious development environment, especially when it comes to publishing.

A better way to do it is to go for something a bit more portable, like the actual .NET Core SDK, which is available for download as a simple compressed archive, with no installation required. Once you’ve got a handle on the command line options, its the best way to put something together that will also work on a build server of some description.

I’ll talk about the build and deployment process in more detail in the second post in this series, but for now, its enough to know that you can use the combination of the command line tools from the .NET Core SDK along with Visual Studio to create, develop, build, package and publish your .NET Core AWS Lambda function without too much trouble.

At that point you can choose to either create your Lambda function manually, or automate its creation through something like CloudFormation. It doesn’t really matter from the point of view of the function itself.

You Got Lambda In My .NET Core!

From the perspective of C#, all you really need to write is a class with a handler function.

namespace Lambda
{
    public class Handler
    {
        public void Handle(System.IO.Stream e)
        {
            // do some things here
        }
    }
}

A stream is the most basic input type, leaving the handling of the incoming data entirely up to you, using whatever knowledge you have about its format and structure.

Once you have the code, you can compile and publish it to a directory of your choosing (to get artefacts that you can move around), and then push it to the AWS Lambda function. As long as you’ve targeted the appropriate runtime as part of the publish (i.e. not a standalone platform independent application, you actually need to target the same runtime that you specified in the AWS Lambda function configuration), the only other thing to be aware of is how to tell the function where to start execution.

Specifying the entry point for the function is done via the Handler property, which you can set via the command line (using the AWS CLI), via Powershell Cmdlets, or using the AWS dashboard. Regardless of how you set it, its in the format “{namespace}::{class}::{function}”, so if you decide to arbitrarily change your namespaces or class names, you’ll have to keep your lambda function in sync or it will stop working the next time you publish your code.

At this point you should have enough pieces in place that you can trigger whatever event you’re listening for (like a new S3 file appearing in a bucket) and track the execution of the function via CloudWatch or something similar.

Helpful Abstractions

If you look back at the simple function handler above, you can probably conclude that working with raw streams isn’t the most efficient way of handling incoming data. A far more common approach is to get AWS Lambda to deserialize your incoming event for you. This allows you to use an actually structured object of your choosing, ranging from the presupported AWS events (like an S3 event), through to a POCO of your own devising.

In order to get this working all you have to do is add a reference to the Amazon.Lambda.Serialization.Json nuget package, and then annotate your handler with the appropriate attribute, like so:

namespace Lambda
{
    public class Handler
    {
        [LambdaSerializer(typeof(Amazon.Lambda.Serialization.Json.JsonSerializer))]
        public void Handle(Amazon.Lambda.S3Events.S3Event e)
        {
            // Do stuff here
        }
    }
}

In the function above, I’ve also made use of the Amazon.Lambda.S3Events nuget package in order to strongly type my input object, because I know that the only listener configured will be for S3 events

I’m actually not sure what you would have to do to create a function that handles multiple input types, so I assume the model is to just fall back to a more generic input type.

Or maybe to have multiple functions or something.

To Be Continued

I think that’s as good a place as any to stop and take a breath.

Next week I’ll continue on the same theme and talk about the build, package and publish process that we put together in order to actually get the code deployed.