Blog posts

From Researcher to RSE: My Career Path

Diego Alonso Álvarez is a Senior Research Software Engineer in the Research Computing Service at Imperial College London. In this post he reflects on his career so far, from post-doctoral researcher to working as a full-time software engineer since joining Imperial’s RSE team in November 2018.

1. Setting the scene: who I am and why I am writing this

I am a research software engineer (RSE) but until just one year ago, I was a post-doctoral researcher in the Department of Physics at Imperial College London. Before I forget how being a researcher was, I am writing my experiences on both career paths and the pros and cons of each of them. This has been an exciting task for me to reflect on my own career and why I made the decisions I made. Hopefully it will also be something interesting for others to read and, possibly, benefit from.

It is worth to emphasise that this blog post is about me and how I have experienced both roles. This is not, by any means, an unbiased description of the academic and the RSE careers neither it is an attempt to describe what being a researcher or an RSE generally is, the latter being a hot topic of discussion in the RSE community, anyway. Some people will find my experiences mirroring some of their own; others will feel completely identified with everything; others will consider my whole story completely alien and nothing to do with their own.

Either way, let’s begin!

2. My career as a researcher

2.1. The context

It took me a while to realise I was a researcher. Indeed, I do not think I thought of myself as researcher until after finishing my PhD and starting my first postdoc in Edinburgh, back in 2012, probably because I had not experienced the whole “research world” until then.

However, I certainly was a researcher before that. For 6 years, since 2006, I carried innovative research in the field of quantum semiconductor nanostructures for novel infrared photodetectors and solar cells. I am not sure if my PhD supervisors were very permissive or if I was very independent, but in any case, I generally worked a lot on my own and did things my way, normally quite successfully.

Going to Edinburgh immediately after finishing my PhD was an intermediate step. As with the PhD, I was pretty independent there and could work anyway I wanted, whenever I wanted, as long as I produced scientifically sound results. But others were not so independent. I could see around me (within the same group and in others) much more demanding constraints and bitter discussions on who should author what and in what order, on how many hours someone had been using some equipment, etc. I did never experience any of that myself.

By the time I went to Edinburgh, I had already submitted a Royal Society Newton International Fellowship application (failed!) and a Marie Curie Fellowship application (success!) to come to Imperial College London. This was my first crash course on research: impossibly long applications; incredibly long waiting times; zero or very limited feedback if not successful. In any case, I was successful in the end, so here I came. I worked happily as part of the Quantum Photovoltaics group, first as Marie Curie Fellow, then as research fellow associated to a European project and finally as a plain postdoctoral researcher associated to an EPSRC project. Not exactly going up the academic ladder.

During these years, I followed the “book of the researcher”: I enjoyed facing the challenges of creating new experiments and shedding some light into novel, potentially ground breaking data, I published a few tens of papers, collaborated with many institutions and travelled worldwide presenting my work in the top conferences of the field of solar energy. I also did some undergraduate teaching, student supervision, lab management, a lot of coding – both for research and also related with outreach – and wrote applications for fellowships and lectureships. Often, all of it at the same time, multitasking.

Cutting the story short, I was not successful with any of the lectureship applications, neither with the fellowships, so my career was not really going anywhere. All of them were very time consuming to prepare – the last fellowship took me a whole year -; all of them took a lot of time to be resolved – in one case I had to write to find out what was going on -; in none did I receive any feedback beyond “it was very competitive.” Well, I already knew that. What I wanted to know was where I was weaker, to further develop that area and have better chances the next time.

The bitterness in all of this is not so much for failing but for the complete absence of any gain from those failures. There was no learning experience. They were very time consuming just to reach dead ends. And the same applies to rejected papers or collaborations that end up going nowhere.

2.2. Pros and cons

So, after this dissertation to put into context my opinions, I have come up with the following list of pros and cons of life in research. They are not in any particular order, but it should be pretty obvious by now to which I give more weight.

2.2.1. Pros

– Freedom of working any time of the day and day of the week: Results matter more than hours worked

For me, this is one of the biggest benefits of life as a researcher, but also a double-edge sword. It requires for you to be honest about what to do and by when, and then do it. And also, for your supervisor or line manager to demand and value those results appropriately. Otherwise, no one will do anything. Or you will need to work many more hours in order to have the work done.

– Work for your own benefit and reputation

This is a bit vague, as it could be “for the benefit of humankind”, but I think there is a bit of selfishness and desire to be recognised in any researcher.

– Limited supervision and/or accountability

Clearly dependent on who is your supervisor, but in my experience, I rarely had to give any explanation on what I had been doing beyond the outcomes (aka papers, conferences, etc.) we had agreed and that were expected.

– Very clear career progression path

PhD Postdoc Research Fellow Lecturer Reader Professor. Some steps slightly different depending on the institution, but roughly speaking, the same anywhere, and with more or less clear responsibilities and benefits.

– A lot of opportunities to learn soft skills

Soft skills being anything that is not in your job specification, that you can use somewhere else and that, for some reason, you spend most of the time doing. It is important to note that soft skills become relevant only when you think on changing roles.

2.2.2. Cons

– Unhealthy competitiveness between researchers for publishing first, accessing or controlling a laboratory, order of authors in a paper…

I did not experience this personally, but I saw it happen to friends and it is one of the most counterproductive and damaging things for anyone’s mental health. An absolute motivational killer.

– Extreme pressure to publish and get grants

What to say about this? The vicious circle of publishing to get grants to keep publishing to do… what? Very often research misses the point completely: papers and grants are a means to an end, not the end themselves, and in doing otherwise, the result is poorer, emptier research, and a waste of resources and money.

– Very long feedback loops between doing something until having a response to it

My favourite and probably the reason I lost interest in research. I cannot emphasise more the absolute waste of time and anxiety that all of this lead times produces in a researcher:

– Grant/fellowship submission Resolution of the call
– Submitting a paper Having the paper published
– Publication of papers Anyone actually benefiting from them

– Too narrow research topic resulting in limited scope for learning new things

This is hard to spot while you are inside, but the truth is that we become experts of things so absolutely specific that if we want to learn anything else slightly off track, we cannot. Two things happen: (1) you rarely have time to do it because you already have plenty of things in your plate and (2) the community of that other field will not accept you because you have not been working on that topic for ages and, therefore, are not an expert. I tried to do it, moving from solar cells to batteries and energy storage. It did not work.

– Often required to spread too thin

Affects all levels of the academic ladder. The upper steps more related to managing too many people, too many project proposals and too many connections and potential partners; the lower steps trying to pursue side research lines and activities beyond the real topics of their jobs because they cannot say “no” to whatever comes from above. Another source of stress (on top of everything else) preventing you to focus on having things done.

– Often requires working many hours outside normal working hours

The dark side that comes with the freedom of working hours. Things have to be done, for the good and the bad.

3. My career as research software engineer

3.1. The context

The first question to answer will be how I ended up being a research software engineer. Sure, I applied to an offer I saw somewhere, but it is interesting to describe how I found out about the offer in the first place, because it is a clear example of where RSE might be coming from in many cases.

I was presenting the solar cell simulation package I had been working on, Solcore, to some potential users at Imperial’s Department of Materials. After the presentation and the discussion, one of the attendees told me that Imperial’s Research Software Engineer team could help me polishing the software and solving some of its issues and limitations. I had never heard of such team, but it sounded useful. I took note of the web page and a few days later join the Imperial RSE community mailing list. I have to admit I never followed up that lead and ignored any communication from that mailing list. Until a few months later, when I had a look at it by chance and saw a vacancy for a research software engineer position.

Reading through the job description was quite an eye-opener. This job was not only very close to things that I had been doing, informally, as a researcher; it was about things I really enjoyed doing! Sure, there were a few technical skills I did not had – and I still do not – but overall, it seemed an amazing fit for me. And it was a permanent position. This had an enormous weight, considering my personal situation of having a few-months-old baby and having spent the last decade on relatively short (1-3 years each) fix term contracts. So, I applied… and got the position!

The job as RSE could not be more different to the one as researcher, at least from the point of view of the working environment and daily routine. Imperial’s RSE team is part of the Research Computing Service, in turn part of Information and Communication Technologies, a massive department in charge of maintaining and improving all of Imperial’s computing infrastructure. We all work in a large open plan office and the look and feel is way more professional than the – often – messy researcher’s offices. Everyone there – including us –have a pretty regular and consistent schedule, being the office mostly empty at 5 pm.

The work itself is faster, much, much faster: we have concrete goals to achieve, concrete steps to get there, concrete deliverables. It does not matter if we are talking about developing a new code to support the research of a certain group, refactoring an old, hard to maintain piece of software, preparing a workshop for a conference (there are indeed great RSE conferences!), or the materials for a training event. We are paid to provide a specific service to a client under some constraints (money, time, scope) and we have to deliver, be efficient and straight to the point. This dynamism is not stressful at all, much to the contrary, it is quite relaxing to have specific steps to take to go to a specific place in a specific time. Tasks are short, feedback comes fast, and reviewing performance (your own or the one of the pieces of software you have been working on) is also very fluid.

Also contrary to what I would have thought before, there is plenty of scope for learning new things and to be creative when applying solutions to the problems you have to face. Indeed, I have certainly learnt way more in the last year as RSE than in the previous few years as researcher.

Not everything glows, of course. Specially being a beginner in the field without any formal training whatsoever in computing, I sometimes struggled with concepts or tools that were taken from granted. Software design patterns could be one of them, correct use (and understanding) of git could be another one, code debugging using proper debugging tools and not “print” statements, basic concepts of parallel computing… All of that comes with practice, of course, but when things move so fast and time is so precious, you certainly fear not being up to the expectations or wasting other’s time when they have to solve your own issues.

I have just become Senior Research Software Engineer. That suggest I have done my job well – of which I am really proud! – but also points to how fast and different things might happen outside the academic ladder.

3.2. Pros and cons

Pros and cons have been mostly described already, but to be consistent and add a few more on each category, here is a more exhaustive list.

3.2.1. Pros

– Still enjoying the academic environment and life on campus

I still work at the University, in touch with researchers, embedded in the academic environment, the students, the food outlets… It is the comfort zone, familiar to me, and that makes things much easier.

– Fast pace, with short reporting times and feedback from clients or colleagues

As described above, this is the absolute opposite to life as a researcher and, therefore, my favourite point in favour of the work as RSEs. You can feel that things happen and change in real time, that there is a real impact and specific feedback guiding you to the next steps that week, or the next, or the following month, at most.

– A new, growing community with limitless possibilities to stand out

There are many RSEs but the community itself is quite young. The professional bodies are being formed right now, the conferences are just a few editions old, the structure of the RSE career path is… fuzzy. There are plenty of things to be done and to make a difference, to be pioneer.

– Broad field with many tools, techniques and practices to learn (and growing)

The field of information technologies is huge and growing. Even if you constrain to those things specifically useful for the projects or tasks you are involved at any given time, you will not get bored of options for learning.

– Very open and collaborative community with limited competitiveness

While researchers certainly collaborate with each other, there is always a sense of competition, of being the first in publishing something or getting new results. RSEs seem to be much more relaxed on that. They are enthusiastic about sharing their ideas and expertise in different formats and contexts. They like concepts like sustainability, transparency, open software, open research, collaborative events like hackathons, online forums… In this respect, RSEs are what researchers should have been in the first place.

– 9-to-5 job

As much as I valued the freedom of working in academia, I have come to value more the rigorous 9-to-5 job I am enjoying as an RSE, without any need to work during weekends, in the evenings or to mull work-related issues while commuting.

– Comes in many flavours

The job of RSEs is quite broad and you can easily focus on those aspects that are more fulfilling to you, like teaching and training, coding, HPC or community engagement, for example. Most likely, you will also have to cover some of the other aspects but, at least in my case, I certainly have scope for customising the work I would like to do.

3.2.2. Cons

– Rigorous criterium on what projects one can work on, with limited scope to pursue personal projects or exploratory ideas

This is one of the catches of the job. You are very involved with research and what researchers do… but you are not one of them. Even if you have brilliant software ideas that you will like to explore and put into practice – even if they fall into the remit of what an RSE will do – you cannot do them because that is not what you are paid to do. This is particularly annoying for me now that I know a million ways of improving the software I developed as a researcher and I simply cannot devote time to do that.

– Rigorous account of the working hours and the exact activities carried along the week

This is more an annoying thing that an actual negative aspect of the job. Given that you work as a service to others, the time you spend doing each of the tasks have to be carefully accounted for. Sometimes, this is easy, but others – specially days you are less productive for whatever, perfectly sensible reason – accounting for all your time might cause some anxiety.

– Salaries equivalent to those of academic researchers, but much lower than those of similar positions in industry

This is a general issue in academia, including for researchers: we are often paid much less than our counterparts working in the private sector. And probably there is not much to do about it. For RSEs this difference might be more outrageous when you see the starting RSE salaries in companies like Google, but I think that we, in academia, have some other maybe less tangible, benefits.

4. Summary

To conclude, I think it is clear by now that I am very happy in my role as an RSE. I did enjoy – massively – my time as a researcher; I learnt a lot of things, some useful, others not so much; it gave me the opportunity to travel all around the world, presenting my work in amazing places I would have never visited otherwise; meeting great, very clever people…

But in the end, the lack of progression in my career, the cumulative negative aspects I was putting together and – by all means – my own personal situation, made me move on and take that opportunity that popped up out of the blue. This first year as an RSE has convinced me it was the right decision.

 

To find out more about Research Software Engineering at Imperial College and opportunities to join RSE team visit our webpage or follow us on Twitter.

Using the Cloud for Research Software Engineering

We previously described three RSE-related use cases for Microsoft’s Azure platform, ranging in deployment granularity from VMs to individual JavaScript functions. In this post we’ll explain further how we use those and other Azure services to complement our on-premise infrastructure – helping us to deliver our RSE projects faster.

At Imperial we’re fortunate to have a powerful and well-maintained high-performance computing (HPC) system. We use this as a batch processing back-end for user-facing web applications that we have developed (such as Smart Forming) and for benchmarking projects including MUSE. The web applications themselves are typically hosted on CentOS VMware virtual machines hosted in our data centre and maintained by a dedicated team within ICT. These servers are set up to authenticate against our institutional sign-on system, are pre-configured with monitoring and alerting, and can directly access other on-premise systems (such as the HPC cluster and our Research Data Store).

Despite this local infrastructure we still derive a lot of value from access to our institutional Azure subscription, in both ad hoc and longer-term use of cloud resources. This gives us capabilities that would be difficult or costly to replicate on-premise. These include:

  • The ability to rapidly provision and tear-down systems and services
  • Access to higher-level (lower-maintenance) abstractions i.e. PaaS and FaaS
  • Access to a diverse range of operating systems and configurations, from VMs for multiple versions of Windows to macOS build agents

In particular we rely on the following services:

  • DevOps Pipelines: Cross-platform QA (primarily testing and linting) and packaging (including PyInstaller builds on macOS and Windows). Build failures are pushed to relevant Teams channels.
  • Functions: Our Trending app provides us with information about active repositories in our institutional GitHub organisation. Using Functions makes its deployment zero-maintenance.
  • App Service: Our GtR app provides us with alerts for new UKRI grants to Imperial College. It is deployed to App Service to avoid the setup and maintenance required of a standalone VM.
  • Cosmos DB: Both GtR and Trending use the MongoDB API provided by Cosmos.
  • Virtual Machines: We use Azure when we need VMs for long-running services that are required to accept incoming requests from other systems but don’t need access to on-premise resources, or when we need short-lived VMs for testing purposes:
  • Container Registry: We use continuous deployment for all our web apps (including MAGDA and POWBAL), meaning that pushing to the master branch in GitHub is sufficient to run our QA pipeline, build a Docker image which is pushed to the Azure registry, and for Watchtower to pull the image onto the target server and restart the relevant service(s).
  • Single Sign-On: This allows users of our internal apps to authenticate using their existing Office 365 accounts – avoiding the need for further login details.
  • Notebooks: We have our own Jupyter server attached to our cluster and data store, but Azure Notebooks are very useful for sharing externally, and for teaching large classes.

In short, Azure provides us with services that work alongside our existing systems, enabling us to deliver RSE projects more effectively and with much lower operational overheads than if we tried to replicate the same features on-premise. And by becoming familiar with these services we’re better equipped to advise and assist researchers across Imperial College who wish to take advantage of all the compute resources at their disposal – on-premise and in the cloud.

Hacktoberfest 2019

On Thursday 10th October a Research Software Engineering (RSE) themed Hacktoberfest event was hosted by Imperial College’s Research Computing Service, Research Software community and ICT. Signups from Imperial spanned all four faculties and the event also produced external interest with registrations from UCL and the V&A.

The evening opened with a short introduction to Hacktoberfest from Jeremy Cohen of the Imperial Research Software community. Chris Cave-Ayland from the RCS followed with a crash course on the steps to follow when making a contribution to an open source project on GitHub. The last opening talk was given by Vasily Sartakov from the Large-Scale Data and Systems Group (LSDS). Titled “Open Source Opportunities”, Vasily’s talk showed how open source has become the dominant paradigm for software development.

The opening talks were followed by lightning talks from the research projects participating in the Hackathon. In total, six software projects attended from five different groups:

Pizza-fuelled hacking then commenced! Participants either chose one of the presented projects to work with, or an alternative project that they were interested in contributing to. Thanks to having direct access to project developers, attendees were able to get up to speed quickly and start working on pull requests for submission. Participating in Hacktoberfest was found to be very valuable for the research projects involved with a total of eight new contributions made so far.

Hacktoberfest

Many thanks to all the speakers and participants who took part, and to Imperial College ICT for supporting this event. We hope to see many of you claiming your Hacktoberfest t-shirt by the end of October!

RSEConUK 2019

September 2019 saw the 4th Conference of Research Software Engineering (RSEConUK 2019) take place in Birmingham, UK. From the 17th-19th September over 350 RSEs, software engineers, researchers and people with a wide range of related roles came to the University of Birmingham to participate in the largest Research Software Engineering conference yet.

RSE19 conference photograph
RSE19 conference photograph courtesy @RSEConUK

While the majority of the attendees were from the UK and Europe, the conference attracted people from around the world.

The conference has been growing each year and this time there was a packed schedule including two keynotes, a series of parallel sessions with talks and panels, a day of workshops and some additional special sessions such as RSE Worldwide.

Imperial was well represented with 11 members of the College attending the conference at various times during the week and getting involved by volunteering, giving talks, joining panels, running workshops and presenting posters:

It was fantastic to see so much participation from Imperial and representatives from many different departments across the College. This provides a great example of how Research Software Engineering at Imperial is such a vital element of the College’s research output and we look forward to seeing an even greater presence from Imperial at next year’s conference.

Research Software London Software Carpentry

On the 9th and 10th July 2019 the Research Software London community ran its first regional Software Carpentry workshop. The event was jointly organised by Imperial, UCL and Queen Mary with Queen Mary hosting the workshop at their Mile End Campus. Several Imperial software carpentry volunteers and members of the Imperial research software community were involved in organising and running the event along with organisers, instructors and helpers from UCL and Queen Mary. The workshop covered a standard Software Carpentry syllabus with the attendees being taught the basics of the Unix shell and git on the first day of the workshop with an introduction to Python on the second day.

The majority of attendees were from Queen Mary, UCL and Imperial but spaces were also made available to the wider RSLondon community. This provided a great opportunity for newcomers to the research software field from institutions that don’t currently run carpentry workshops to attend and learn some core computing and software development skills. More than 30 people registered for the workshop and we received significant positive feedback as well as helpful suggestions on possible enhancements for future workshops.

Software Carpentry lesson
Image courtesy of David Pérez-Suárez

Building on the success of this event, RSLondon are planning to run further such workshops and are looking at other areas covered by The Carpentries for future sessions, in addition to Software Carpentry. If you have contacts at other institutions in London and the South East region who you think would be interested in hosting or attending an RSLondon Carpentry workshop later in 2019, get in touch with Jeremy Cohen

Imperial College RSE Team members Chris Cave-Ayland (instructor) and Mayeul d’Avezac (helper) assisted at this workshop.

deRSE19

The first German national RSE conference took place in Potsdam on 4th-6th June 2019 with 187 attendees. deRSE19 was a really vibrant, welcoming and well-organised event in a great location and had a diverse agenda, encouraging participants from across Europe to share experiences of software engineering in research.

deRSE19 group photo
deRSE19 aerial group photo (CC-BY Antonia Cozacu, Jan Philipp Dietrich, de-RSE e.V.)

In terms of presentations Imperial College was the best-represented institution from outside Germany, with the following speakers:

  • Jeremy Cohen (EPSRC RSE Fellow, Department of Computing) who presented a talk on building research software communities and a poster about RSLondon.
  • Alex Hill (Senior Web Application Developer, Department of Infectious Disease Epidemiology) who spoke about the challenges of conducting constructive code reviews, particularly in a research setting.
  • Mark Woodbridge (RSE Team Lead, Research Computing Service) who gave a talk on RSE 2.0, reflecting on progress in Research Software Engineering and how it may develop in the near future.

Many thanks to all the event organisers and sponsors for giving us the opportunity to present.

Also during the conference a keynote on RSE collaboration was delivered by Alys Brett, chair of the newly established Society of Research Software Engineering and head of the Software Engineering Group at the UKAEA. UK RSEs also attended deRSE19 from the Software Sustainability Institute, the University of Westminster, and the University of Southampton. We look forward to reuniting with them, as well as colleagues from Germany and beyond at UKRSE19 in September!

Research Software in Physics event

The first Imperial College Research Software in Physics event took place on Friday 17th of May. This event, organised by the Imperial Research Software Community and supported by the College’s ICT department, aimed to help researchers to meet others writing or using research software (RS) in Physics and learn about resources available to help them do so. It gathered around 25 people from all seniority levels and several departments, who shared for over two hours their experiences and opinions on different aspects surrounding the development and use of software for research.

Diego Alonso Álvarez

The event was opened by Diego Alonso Álvarez, a member of the Research Software Engineering team in the Research Computing Service and ex-member of the Physics Department, and Jeremy Cohen, coordinator of the Imperial Research Software Community. Between them they gave an overview of the value of research software (RS), the services available at Imperial to promote software sustainability and good coding practices, and the broader landscape of the RS community in the London area and the UK.

Pat Scott, a lecturer from the Astrophysics group, gave the first of the invited talks, focused on GAMBIT, “a global fitting code for generic Beyond the Standard Model theories” but with potential utility in any other research discipline. Pat highlighted that coding is not an add-on in physics any more but an integral part of it. He also pointed out that while it is important to have good coding practices, increase your user base and publish papers on your code, in the end, in the broader community, you will be judged by your contributions to physics, not software.

The second talk was given by Kelvin Choi, PhD student from the Space & Atmospheric Physics Group, broadly speaking about the challenges involved when working with climate data and models. Among other topics, Kelvin discussed the need to wrap legacy code in more modern languages in order to maintain the traceability and the comparability of the results to those carried out in the 1980s. He also described the need for a pipeline transforming the raw TB of data coming from the satellites to the end results, gluing together different software – often written in different languages – and combining different data formats.

These speakers were followed by 7 lightning talks given by researchers in the department and the Imperial RSE team, including:

  • specific applications of GAMBIT (Janina Renk) and its combination with other tools like TensorFlow to efficiently explore a large parameter space (Benjamin Farmer);
  • the description of custom advanced software for the modelling of the formation of planet-forming discs (Marija Jankovic) or the performance of novel solar cells (Philip Calado);
  • the challenges of dealing with legacy code and data related to the Cassini mission (Gregory Hunt);
  • some of the activities of the RSE team, improving the accessibility of the data from the Cassini mission (Christopher Cave-Ayland) or using the xarray Python package to improve the quality and readability of existing code (Mayeul d’Avezac).

All the talks were very engaging and in several cases sparked discussion points that were adopted for the final part of the event. In the discussion session, the audience was divided into groups around different topics ranging from code peer review and code review practices in the software engineering industry, testing practices and reproducibility or software development models and methodologies within the research community. Dedicated blog posts on the contents on these discussions will follow in due course.

Many thanks to Pat Scott, all the speakers, and everyone who attended the event.

RSLondonSouthEast 2019

Research Software London‘s first annual workshop took place at the Royal Society on February 8, 2019, bringing together a regional community of research software users and developers from over 20 institutions. It featured a diverse schedule of talks and discussions about software engineering, community building and both domain-specific and general-purpose tools of relevance to research.

There were four talks from Imperial researchers, including the keynote from Professor Spencer Sherwin, Director of Research Computing. The College’s Research Computing Service was also represented by Dr Diego Alonso Álvarez, who presented an introduction to xarray and described the RSE team’s work on integrating it into the MUSE energy systems model.

Please see Diego’s slides for more information. Other talks and media from the event are available via #rslondonse19.

Thanks to RSLondon, the programme committee and its chair Dr Jeremy Cohen for organising an informative and stimulating day, and to the EPSRC for supporting the event. We’re looking forward to participating in future meetings and helping further strengthen the regional RSE community.

Cloud-first: Serverless alerts for trending repositories

This is the third and final post in a series describing activities funded by our RSE Cloud Computing Award. We are exploring the use of selected Microsoft Azure services to accelerate the delivery of RSE projects via a cloud-first approach.

In our previous two posts we described two ways of deploying web applications to Azure: firstly using a Virtual Machine in place of an on-premise server, and then using the App Service to run a Docker container. The former provides a means of provisioning an arbitrary machine much more rapidly that would traditionally be possible, and the latter gives us a seamless route from development to production – greatly reducing the burden of long-term maintenance and monitoring.

By taking these steps we’ve reduced our unit of deployment from a VM to a container and simplified the provisioning process accordingly. However, building a container, even when automated, incurs an overhead in time and space and the resultant artifact is still one-step removed from our code. Can we do any better – perhaps by simply bundling our code and submitting to a suitable capable runtime – without needing to understand a technology such as Docker?

Azure Functions provide a “serverless” compute service that can run code on-demand (i.e. in response to a trigger) without having to explicitly provision infrastructure. There are similarities with the App Service in terms of ease of management, but also some differences: principally that in return for some loss of flexibility in runtime environment you get an even simpler deployment mechanism and potentially much lower usage charges. Your code can be executed in response to a range of events, including webhooks, database triggers, spreadsheet updates or file uploads.In this post we’ll demonstrate how to run deploy a simple scheduled task: a Node.js script that sends a periodic email identifying the most active repositories within a GitHub organisation. It uses the GitHub GraphQL API to get the the latest statistics (stars, forks and commits) and tracks the changes in a database. I use this script to receive weekly updates for trending repositories under ImperialCollegeLondon, but it’s easy to reconfigure for your own organisation.

As previously, we’ll use the Azure Cloud Shell, and arguments that you’ll want to set yourself are highlighted in bold.

Getting started

As usual we first create a resource group, and then add a storage account for our function:

az group create --name myResourceGroup --location westeurope
az storage account create --resource-group myResourceGroup --name ictrendingstore --sku Standard_LRS

Creating our function app

Then we create our app (a container for one or more functions):

az functionapp create --resource-group myResourceGroup --name ictrending --storage-account ictrendingstore --consumption-plan-location westeurope

And upgrade Node.js so that we can use ES6 features including async functions:

az functionapp config appsettings set --resource-group myResourceGroup --name ictrending --settings FUNCTIONS_EXTENSION_VERSION=beta WEBSITE_NODE_DEFAULT_VERSION=8.9.4

Deploying our code

Before we upload our code we configure the runtime with some required configuration (repository name, GitHub token, MongoDB URL and email settings):

az functionapp config appsettings set --resource-group myResourceGroup --name ictrending --settings GITHUB_ACCESS_TOKEN=xxx ORGANISATION=ImperialCollegeLondon MONGO_URL=mongodb://username:password@example.com/db SMTP_URL=smtp://username:password@example.com EMAIL_FROM=from@example.com EMAIL_TO=to@example.com

I’m using Azure’s MongoDB-compatible service (Cosmos DB) but there are many other hosting providers, including MongoDB themselves (Atlas).

We then simply upload a zipped copy of our code, its dependencies, and a trigger configuration (a timer for 8am on Mondays):

curl -LO https://github.com/ImperialCollegeLondon/trending/releases/download/v1.0.0/trending.zip
az functionapp deployment source config-zip ---resource-group myResourceGroup --name ictrending --src trending.zip

You’ll subsequently receive your weekly email on Monday morning, assuming there has been some activity in your chosen organisation!

Inspecting the code reveals that it needs to comply with a (very lightweight) calling convention by exporting a default function and invoking a callback on the provided context, and it needs to be written in one of several supported languages. We uploaded our source as an archive but you can also deploy (and then update) code directly from source control.

Tidying up

As usual you can delete your entire resource group, including your storage account and function by running:

az group delete --name myResourceGroup

Summary

In this post we’ve shown how zipping and uploading your source code can be sufficient to get an app into production. This is all without knowledge of any particular operating system or virtualisation technology, and at very low cost thanks to consumption-based charging and on-demand activation. Whether you choose to deliver your software as a VM, container or source archive will obviously depend on the nature of the application and its usage patterns, but this flexibility provides potentially great productivity gains – not only in deployment but also long-term maintenance. In this instance it’s a great fit for short-lived scheduled tasks but there any a huge number of alternative applications.

We’d like to thank Microsoft Azure for Research and the Software Sustainability Institute for their support of this project.

Cloud-first: Rapid webapp deployment using containers

This is the second in a series of posts describing activities funded by our RSE Cloud Computing Award. We are exploring the use of selected Microsoft Azure services to accelerate the delivery of RSE projects via a cloud-first approach.

In our previous post we described the deployment of a fairly typical web application to the cloud, using an Azure Virtual Machine in place of an on-premise server. Such VMs offer familiarity and a great deal of flexibility, but require initial provisioning followed by ongoing maintenance and monitoring. Our team at Imperial College is increasingly using containers to package applications and their dependencies, using Docker images as our unit of deployment. Can we do better than provisioning servers on a case-by-case basis to get web applications into production, and thereby more rapidly deliver services to our users?

The Azure App Service provides a solution named Web App for Containers, which essentially allows you to deploy a container directly without provisioning a VM. It handles updates to the underlying OS, load balancing and scaling. In this post we’ll demonstrate how to run pre-built and custom Docker images on Azure, without having to manually configure any OS or container runtime. As previously, we’ll use the Azure Cloud Shell, and arguments that you’ll want to set yourself are highlighted in bold.

Getting started

First of all we create an App Service plan. This only needs to be performed once for your active subscription:

az group create --name myResourceGroup --location "West Europe"
az appservice plan create --name myAppServicePlan --resource-group myResourceGroup --sku S1 --is-linux

Deploying a pre-built, public container image

It’s then just one command to run a Docker container. In this case we’ll deploy Nginx using its Docker Hub image:

az webapp create --resource-group myResourceGroup --plan myAppServicePlan --name ic-nginx --deployment-container-image-name nginx

We can then visit our public site at https://ic-nginx.azurewebsites.net/

You can use a custom DNS name by following these further instructions. Note that the site automatically has HTTPS enabled.

Decommissioning the webapp (thereby avoiding any further charges) is similarly straightforward:

az webapp delete --resource-group myResourceGroup --name ic-nginx

Deploying a custom container image

Running your own app is as simple as providing a valid container identifier to az webapp create.  This can point to either a public or private image on Docker Hub or any other container registry, including Azure’s native registry.

For demonstration purposes we’ll build a Datasette image to publish the UK responses from the 2017 RSE Survey. Datasette is a great tool for automatically converting an SQLite database to a public website, providing not only a means to browse and query the data (including query bookmarking) but also an API for programmatic access to the underyling data. It has a sister tool, csvs-to-sqlite, that takes CSV files and produces a suitable SQLite file.

First we need to install both tools, download the survey data, and convert it from CSV to SQLite:

pip install https://github.com/simonw/csvs-to-sqlite/zipball/master datasette
curl -O https://raw.githubusercontent.com/softwaresaved/international-survey/master/analysis/2017/uk/data/cleaned_data.csv
csvs-to-sqlite --table responses cleaned_data.csv uk-rse-survey-2017.db

Then we can create a Docker image containing the data and the Datasette app with one command, annotating with the appropriate licence information:

datasette package uk-rse-survey-2017.db
--tag mwoodbri/uk-rse-survey:2017
--title "UK RSE Survey (2017)"
--license "Attribution 2.5 UK: Scotland (CC BY 2.5 SCOTLAND)"
--license_url "https://creativecommons.org/licenses/by/2.5/scotland/deed.en_GB"
--source "The University of Edinburgh on behalf of the Software Sustainability Institute"
--source_url "https://github.com/softwaresaved/international-survey"

Then we push the image to Docker Hub:

docker push mwoodbri/uk-rse-survey:2017

And, as previously, create an Azure Web App:

az webapp create --resource-group myResourceGroup --plan myAppServicePlan --name rse-survey --deployment-container-image-name mwoodbri/uk-rse-survey:2017

Using Datasette

After a brief delay the app is publicly available: https://rse-survey.azurewebsites.net/

Note that the App Service automatically detects the right port to expose (8001 in this case) and maps it to port 80.

Datasette enables you to run and bookmark SQL queries, for example this query which lists the contributors’ organisations in order of the number of responses received:

Private registries

If you’re hosting your images on a publicly accessible that requires authentication then you can use the previous az webapp create command into two steps: one to create the app and then to assign the relevant image. In this case we’ll use the Azure Container Registry but this approach is compatible with any Docker Hub compatible registry.

First we’ll provision a container registry. These steps are unnecessary if you already have one:

az acr create --name myrepo --resource-group myResourceGroup --sku Basic --admin-enabled true
az acr credential show --name myrepo

Then we can login to our private registry and push our appropriately tagged image:

docker login myrepo.azurecr.io --username username

docker push myrepo.azurecr.io/uk-rse-survey:2017

Finally we can create our webapp and configure it to be created using the image from our private registry:

az webapp create --resource-group myResourceGroup --plan myAppServicePlan --name rse-survey
az webapp config container set --resource-group myResourceGroup --name rse-survey --docker-custom-image-name myrepo.azurecr.io/rse-survey --docker-registry-server-url https://myrepo.azurecr.io --docker-registry-server-user username --docker-registry-server-password password

The end result should be exactly the same as when using the same image but from the public registry.

Tidying up

As usual, you can delete your entire resource group, including your App Service plan, registry (if created) and webapps by running:

az group delete --name myResourceGroup

Summary

In this post we’ve demonstrated how a Docker image can be run on Azure using one command, and how to build an deploy a simple app that presents a simple interface to explore data provided in CSV format. We’ve also shown how to use images from private registries.

This approach is ideal for deploying self-contained apps, but doesn’t present an immediate solution for orchestrating more complex, multi-container applications. We’ll revisit this in a subsequent post.

Many thanks to the Software Sustainability Institute for curating and sharing the the RSE survey data (reused under CC BY 2.5 SCOTLAND) and Simon Willison for Datasette.