Category: Best tools and systems

Coding along with Azure Databricks and MoJ’s Splink

In January, our ICT team supported and took part in a Public Sector Code-Along workshop at Imperial’s White City Advanced Hackspace with 60 other data specialists from across public sector organisations, with representatives from Microsoft, Databricks the Ministry of Justice (MoJ)’s Splink team and the National Innovation Centre for Data (NICD). 

This event focussed on the MoJ’s Splink package:

Splink is a PySpark package that allows you to link millions of distinct records that refer to an individual entity but lack a consistent identifier. It applies established statistical comparison methods to detect whether records in a dataset related to the same thing by comparing values in any column, and assessing the probability of a match that would be impossible to do manually across large datasets – for example it can be used to detect if two or more similar records amongst millions are actually related to the same individual person. 

Databricks is a unified set of tools for building, deploying, sharing, and maintaining enterprise-grade data solutions at scale. It is the technical solution that we use at Imperial to power the Unified Data Platform, launched by our Data and Analytics product line in 2022.

We also heard from keynote speakers:

  • Paul Watson, Director of NICD, who gave an overview of the work they are doing to support the National Data Strategy and the skilling up and knowledge transfer.
  • Robert Porteous, Deputy Director of Data Strategy, Implementation and Evidence at DCMS, who spoke about the National Data Strategy work, data standards, data share and data challenges.

The workshop gave us the opportunity to get to grips with a government endorsed external public sector package within Databricks. The code from Splink is available in a Databricks workbook that can be easily imported into our Databricks account. 

Improving our data use at the College

The Splink tool has a lot of potential to allow controlled cross-system record matching with many uses. It allows a solid statistical model to be built, trained and then ported between technologies.

“Splink could allow us to develop solid common definitions that would be transparent and auditable. For example, it could help when the College wants to check for things like spotting duplicate people in records (e.g. CID deduping), connecting building data between systems when there are no common building identifiers and so on.”

Andrew Lewis, Information Insight Analyst, ICT.

Working with real word data

Andrew explains what they achieved at the event, “In the workshop we focussed on some real-world data available data sourced from Companies House. The hack involved training a statistical model to create rules for checking data in columns such as first name, date-of-birth, postcode etc and using accepted stats methods of scoring closeness of matches.

This chart below shows examples of how the models can be trained to acceptable levels. (Image from Splink tutorials available with code.)

Table showing interpretation, showing duplicate data

Andrew suggests “It’s too detailed a subject to cover in this post, but Splink has really powerful and robust settings that allow comparisons to be made much faster than other solutions, and with as fine or coarse levels of details as we could need in the rules. It is possible to separate exact matches from probable matches and define what we accept as “must be 100%”, “good enough”, “close – check these” or whatever.”

This chart below shows how applying the probability rules in sequence can greatly increase the chances of spotting data matches, in a way that is automatable and auditable to a defined statistical tolerance. (Image from Splink tutorials available with code.)

This chart shows how applying the probability rules in sequence can greatly increase the chances of spotting data matches, in a way that is automatable and auditable to a defined statistical tolerance.

 

Summary

Overall, this was a great day. We made contact with lots of data practitioners in other public sector organisations and learned about the potential of Splink.

When our analysts and data professionals spend huge amounts of time retrieving, merging, cleaning and verifying College data, it’s time not spent doing the valuable work of understanding and synthesising their analysis into actionable information. Splink has the potential to do this automatically at scale faster than other tools available.

Andy Lewis (Middle) working with other team members

If you want to find out more about how we are using Databricks at the College you can contact our colleagues:

  • Richard Howells, Head of Technology Office
  • Andrew Lewis, Information Insight Analyst
  • Jose Maria Vidal Fidel, Product Developer
  • Maria Teresa Douglas, Data and Analytics Product Owner
  • Henry Nwiido, Data Domain Specialist
  • Nelson Cerqueira, Solutions Architect
  • Cho Fung Chan, Data Specialist
  • James Clubbe, Data Engineer (Data Specialist)
  • Irene Kalkanis, Data and Innovation Lead

Nothing casual about our work ethic!

Deepthi Alex, Darshan Vigneswara and Michele BarrittICT collaborated with HR and departments to create a new and improved way of managing Imperial’s casual worker information to ensure an amazing user experience for students (often casual workers) from start to finish, as well as ensuring compliance with the law!

There was nothing casual about the way ICT worked together with HR to create this new and useful application.

Today I am interviewing ICTs Finance and HR Product Line team – Michele, Darshan and Deepthi who have been nominated for the President’s Award for Excellence for their work on the new Casual Worker app.

What is the new Casual Worker system and what does it do?

Michele – “The system brings casual worker management online, all data in one place. More importantly it monitors compliance as workers are restricted by the hours, they can work based on right to work and visa categories. The College audit had raised a risk that we were not compliant in managing our casual workforce and we had to address this.”

What impact does this improvement have on our staff and students?

Casual worker timesheet app

Michele –  “It improves the process, moving away from documents and spreadsheets to a more secure system, therefore it further improves the security of our data. It improves our legal compliance and monitoring. HR has greater control on approvals for payroll.

As well as this, Management Information can be generated from the apps via Power BI reports and thus HR have much improved information and understanding of our Casual Worker workforce.”

How did you collaborate with the users to achieve the best solution?

Deepthi – “As well as show and tells the team came together as a single product team which allowed us to make business driven decisions and designs. Our Agile ways of working meant that we could deliver business value with each sprint cycle leading to an mvp and its iterations according to business needs.”

What was the best thing about working on this activity?

Michele: “Seeing the product live and making a positive impact on this process for the business and casual workers themselves. Building new working relationships working in partnership with HR.”

Darshan – “Partnering with HR and Change Management through new ways of working to deliver the Casual Worker App successfully.”

Darshan – “Casual Worker App was one of the first enterprise IT solutions delivered using the Power platform which has given us a better understanding of the overall platform which will guide us in future developments.”

What was the most challenging thing about the work?

Michele – “Agreeing the MVP (minimum viable product) and sticking to it.”

Deepthi – “It was the first time that the product line was using Power platform along with complex business rules which resulted in a steep learning curve. However, this has helped us understand the path to adopting new technology.”

Why has your work been nominated for the Presidents Award for Excellence?

Michele – “I believe it was because the teams hard work and dedication over the last year to build and deploy this product that benefits the whole college community from Hr to the students experience!”

How does it feel to be recognised / nominated?

Michele – “Fabulous 😊 Glad our work has been recognised.”

What are you working on next?

Michele – “We just completed the roll out this month and we are now an the ‘early life’ support stage. We then need to compile a product roadmap and agree as a team how and when we enhance the product.

(more…)

Camaraderie and Collaboration Essential!

New Imperial Essentials staff dashboard

A collaboration to improve health, safety and data protection and information security awareness at work.

What is so essential?  

Imperial Essentials are a set of 6 mandatory online courses that staff must complete every 2 years to ensure we are not only legally compliant, but that we are creating and maintaining high standards of health, safety and welfare across the College. 

Essential to success  

With the new centralised training dashboard, we have seen a massive increase in compliance from 3% to 67+%, in the last 12 months!  

Through collaboration with POD, ICT, HR and the Faculties we have achieved an amazing success. Not forgetting the great support and effort from staff across the College!

Hear from some of the team involved in this activity to see how the best tech, collaboration and camaraderie has set us firmly on the road to compliance:

Why did you create the Imperial Essentials dashboard?

Nichola Stallwood, Head of Learning and Organisational Development, said “the main aim of the activity was to achieve 75% completion of all Imperial Essentials courses by 31 May 2022.

We needed a central reporting process on Imperial Essentials courses. But more than this we also needed confirmation of the compliance topics that needed to be completed by all core staff.

The new reporting system also had to ensure Imperial leadership could see the compliance gaps for new starters and existing staff.

As well as the dashboard we also created a policy to establish requirements, exemption criteria and consequences of non-completion.”

Simon Etherton, Information Insight Analyst ICT, said “There wasn’t a central automated system for staff to check these records and be informed when to take the refresher by. The old ICIS OLM system had to be manually updated. One of the aims was to create a tool to enable the business to increase compliance across the College. It would also offer Course Owners to accurately report to their respective Boards on their compliance topic (i.e. Fire Safety).”

Nicholas Wood, Programme Manager, Faculty of Medicine, said:

The dashboard supports de-risking us as an organisation through building awareness of critical topics to help us adopt safe working practices and create a safer working environment for all colleagues.”

What was the best thing about working on this activity?

Juliet O’Rourke -Technology Delivery Manager ICT, said –  “It’s great to work with a team with representation from different sections of Imperial and getting to know new faces.

There was a strong sense of engagement and level of trust in the team’s ability and commitment to succeed. Communication was open and transparent and all decisions making was by consensus with everyone having a voice and all suggestions and solutions considered.”

There was a good sense of camaraderie within the team and a feeling that everyone was enjoying the role with all challenges openly discussed and debated.

Simon said “I enjoyed working with the business and cross collaboration between teams in ICT across several Product Lines, Service Operations, Infrastructure & Shared Services, Business Operations and Technology Office. Colleagues within ICT were very helpful, responsive, enthusiastic and supported me when we spotted ‘gremlins’ in the system.”

Kia Wnuk, HR Information and Insight Manager, said – “There was strong buy in from the project team to develop a viable solution.  We were able to draw on each other’s knowledge and expertise to deliver a successful product.  This would not have been achievable without a multifunctional team.”

What was the most challenging thing about working on this activity?

Juliet said -“At the beginning of this activity, we setup a support email for end users and as this activity progressed and the compliance rate increased, the in box began to fill up which, was challenging but equally rewarding.  It was good to be able to reach out and communicate on an individual level to staff members and help resolve and field any issues or concerns they might have while completing their Imperial Essential courses.  In addition staff comments and queries provided vital information for the content of Frequent Ask Questions (FAQ), targeting communication and the presentation of information on the Imperial Essentials web page.

This activity also presented a number of technical and user base challenges.  From the initial data extraction, manipulation, and presentation for the dashboards, to the construction of the employee only dashboard, and working with new software platforms and automation tools to devise the email workflows and attachments for both the employee and line managers emails.

A number of the systems relied on free text fields to gather staff identifiers which lead to a variable quality in the data. This presented problems with recording course completion dates on the dashboards.  Establishing course data ownership and most recently call for user exemption from courses which, is an ongoing task forming part of next steps.”

Kia said – “Focussing our messaging to deal with the variable data quality issues and translating how the technical processes work to help the team and end users understand how the data is recorded.”

How did collaboration help you achieve your goals?

Juliet said “The business had clear objectives and ambitious targets to improve the College Imperial Essentials compliance rate from 3% to 75% within one year.  This set the agenda for collaboration and as a team we were all focused around creating key deliverables in order to achieve these objectives.

There was a lot of collaboration among the team in knowledge sharing, (e.g., Microsoft Power BI and the new Power Platform Automate tools) where highly skilled members who were willing to share their knowledge and expertise with others, allowing them to take on lead roles in delivering new tasks.

Collaboration was also fostered by having a well-structured plan with a sense of direction and the use of progress monitoring tools e.g. JIRA – which helped provide momentum.  Microsoft Teams is used for central communication, planning and documentation and open chat for discussion points. Collaboration, openness, and transparency provided a good level of team engagement and harmony.”

What will you do next?

Juliet said – “We will continue to work on improvements including; refresh the communication to encourage senior management and HR partners to review the dashboards they have access to and highlight accountabilities, review exemption requests and apply any update changes to the dashboards and increase the reminder emails for individual employee.

And we will start work on a ‘Wish List’ of new features.

Nick said – “listen to the responses from the staff on how to make the system and courses better and implement improvements.”

(more…)

Research Software Engineering enabling ‘Surgery from your sofa’

To increase the quality, impact and sustainability of the research software developed at Imperial, supporting the College in enhancing its world-leading research outputs’.

That is the bold mission statement of the Research Software Engineering Team (RSE) here in ICT.

A large part of a life in academia involves research – the process of investigation into a particular study, utilising various resources and materials.

The RSE team believes in the dependance of research software on modern science and says that ‘good software engineering accelerates and boosts the impact of research’.

With projects like ‘Liionsden’ – that provides a tool which easily archives and visualises the experimental and generated data in simulations in relation to batteries research, and StrainMap – that prepares a new diagnostic MRI technique for use in clinical settings – Therefore, RSE not only  helps Imperial, but benefits many other areas of society.

At the Royal Society Summer Science Exhibition in 2021, even the public got involved in the process; working with HARMS Lab (Human-centered automation, Robots and Monitoring for Surgery), the RSE team produced a game that involved the remote control of surgical tools – and, despite a few ‘hiccups’, finalised a method that utilises head-movement using Gaze-Tracking technology to move the various tools and complete objectives – an emerging technology that can revolutionise surgical processes.

RSE robotic ‘Operation’ surgery game that is controlled by head movement.

Image: RSE robotic ‘Operation’ surgery game that is controlled by head movement.

Open Science and collaboration are the future’ – RSE believes.

The creative and functional freedom that software engineering provides pushes academia forward – more efficient and technically active programs can automate redundant processes, as well as deliver greater precision over manual entry.

There are almost 30 RSE teams already set up at academic institutes across the UK – with official job titles, fellowships and extensive industry opportunities rising, and with the increasing importance in data science, Research Software Engineering is becoming imperative in providing solid academia – with a main goal to ‘create recognition and career pathways for individuals that develop research software in academia. ‘

Better Software. Better Research’ is RSE’s Mantra.

Watch a video of Chris Cave-Ayland, ICT’s Senior Research Engineer showcasing RSE and the process for building ‘Surgery from your Sofa’ – a system for remote control of surgical robots over the internet. He also gives you an insight into Research Software and what his team does.

You can also read more about RSE on their blog.

Author: Matt La, Work Experience, ICT June 2022