Introducing Data Science in the Public Service: Challenges and Solutions (Annalyn Ng)

I’m excited to profile Annalyn Ng, a self-taught data scientist and #womanintech, who is pushing for the adoption of data science in the public service. She currently works at the Ministry of Defence (Singapore), where she analyses data to identify predictors for personnel performance in military vocations.  Originally a psychology and economics major, she first learnt about data science in a statistics class, and has been addicted ever since.

She co-authors a blog,, that teaches data science in layman’s terms, and has recently published a book: Numsense! Data Science for the Layman, which is used as reference material in Stanford and Cambridge.


In this article, she outlines the challenges and solutions to enabling data science in the public service, and ideas about how to build these capabilities individually and in your own organization. All opinions here are her own.

Introducing Data Science in the Public Service: Challenges and Solutions

My plea for wider application of data science is a personal one. My mum passed away due to a misdiagnosis when doctors administered wrong medication while stalling the treatment she required. Then, I wondered—if we can teach machines to play games like Go and Starcraft, can we invest as much to teach machines how to save lives? While we’ve had breakthroughs, such as in automated interpretation of medical image scans, similar success for general diagnosis seems lacking.

Many people regard data science as a craft that is exclusive to tech companies. Let’s dispel this myth. The fact is, wherever there is data, there is potential for data science. If fashion retailers can use purchase history to recommend products and predict trends, we can easily apply the same methods on past medical data to recommend treatment and predict diagnosis.

Despite being a profit driver in the private sector, the use of data science is still relatively immature in public service. Healthcare analytics is one specialised domain with untapped potential, but data science can also be applied in mainstay departments like policy (e.g. analysing public feedback), finance (e.g. flagging fraudulent transactions), and human resource (e.g. personnel deployment).

So, what’s stopping us?

There are two parts to data science: 1) data collection, and 2) data analysis, each with its own unique set of challenges to overcome:

Data Collection

Getting data is often the hardest part of any data science effort. As public data is sensitive, infrastructure is needed to collect data systematically and securely. To reach deeper insights, data from different agencies and ministries need to be merged, and this process usually begs questions on confidentiality.

Hence, data collection requires collaboration across agencies. Mutual trust must be built to ensure that useful data is exchanged for insights to be uncovered. Ownership and maintenance of IT infrastructure should be established, and stress tests conducted regularly to ensure data security. We rely on senior management to set this stage, before public servants can take cue to play their part.

Data Analysis

Once we have data, we need to analyse it. Skilled data scientists are required for this role, but talented ones might be enticed away by private companies while those committed to stay might not be given the support to learn, thereby resulting in a lack of expertise.

However, expertise can be developed. It is a misconception that data science is solely quantitative. Data literacy can be divided into two levels: 1) knowing how data analysis works, and 2) executing the actual analysis.

The first level is basic knowledge on how algorithms work and their assumptions. These do not involve much math, and thus should be made accessible to everyone.

Algorithms are increasingly being automated, lowering the bar to allow people with non-technical backgrounds to do basic data exploration through apps and dashboards. As data science research becomes more accessible, we need to improve data literacy among regular public servants, to ensure that conclusions made from such research are accurate.

Besides checking results for errors and assumptions, a broad understanding of data analytics can help managers to identify potential data sources, as well as to facilitate collection of data in a suitable format for analysis. In turn, analysts are likely to be appreciative of managers who provide conditions for work to be done effectively.

The second level is technical know-how of math and coding that data scientists, rather than managers, need to master. To nurture expertise, we need to build an ecosystem for experts to thrive. Many agencies have made the mistake of recruiting data scientists in isolation. Without peers who can provide feedback and healthy competition, data scientists may have fewer ideas to build on and less motivation to improve. Therefore, it is crucial to deploy data scientists in teams.

While data scientists can either be trained in school or self-taught, enlightened employers have since realised that the medium of learning is less important than the rigor and continuity in learning. Many companies, including Google and Facebook, have sought out programmers with no formal degree but nonetheless armed with a solid portfolio of coding projects.

Regardless of our current level of expertise, data science is an evolving field, so a data scientist’s learning journey never ceases as they seek to add new techniques to their toolbox through constant reading and practice.

So, how do we start learning?

Traditional classroom training is growing obsolete as they are costly, time-consuming, and possibly ineffective as participants are likely to forget technical details without constant review. Moreover, data science is a fast-moving field, and any one-off training is unlikely to suffice for public servants whom we wish to groom as experts.

As a data science convert myself (having majored in psychology and economics), I have a few alternatives to suggest:

Enrol into massive open online courses (MOOCs), which are video courses available freely or easily priced within $20. Examples of established course platforms include Coursera, Udacity and Udemy. Participants can choose courses based on reviews, and good instructors are also prompt in addressing Q&A on forums. With courses spanning a range of difficulty levels, both beginners and experts can find content suited for their needs. Moreover, as course videos are usually made available for a lifetime, participants can review them whenever they need to.

Learning is not just about sponging up knowledge, because knowledge is easily forgotten without practice. Therefore, to apply what I learn, I’d usually pair my learning with relevant projects. Managers can also encourage a proactive learning culture, such as allowing staff to reserve time for research and experimenting with new data science methods.

After mastering new techniques, I’d share what I learn with others because teaching reinforces learning. Writing blog articles is a convenient way to do this. To engage a non-technical audience, I’d leave out the math and jargon, and instead focus on intuitive explanations and visuals. I eventually compiled the tutorials into a book: Numsense! Data Science for the Layman, which, I’m ecstatic (!) to share, has since been chosen by top universities like Cambridge and Stanford as reference text. Nevertheless, simply keeping a blog can be gratifying, knowing that your tutorials can benefit a global audience.

As for colleagues just starting out in data science, I frequently encourage the recruitment of interns with statistics or computer science background to help with relevant projects. This is a win-win arrangement—supervisors get to learn more techniques, while interns get to appreciate data science applications in the public sector. To ensure accuracy of results, projects can be vetted by trained colleagues.

Finally, there are opportunities for everyone, regardless of expertise, to get together to share ideas. Data science meetup groups are common in major cities, often featuring a range of speakers from different industries, and attracting large audiences interested to learn and network.

So, where do we go from here?

Learning data science is just a means to an end. In public service, the end goal would be to use data science to improve lives.

A predictive algorithm to diagnose heart disease would be useless if we cannot pack it into a fast and intuitive interface that any doctor can use. To build products incorporating data science, we need to plug data scientists into interdisciplinary teams of engineers and designers. Here, good communication is essential to facilitate teamwork, as well as to convince end users of product benefits.

In implementing a data science product, we also need to validate it regularly, to ensure that it remains effective over time. This is not as straightforward as it sounds. Take, for example, an algorithm that predicts whether a person requires medical treatment for a latent disease. To conclude that the algorithm is more accurate than doctors’ judgement, we need to compare the health outcomes of two groups—one selected by the algorithm, and the other selected by doctors. This inevitably raises ethical questions of whether we’d be denying early medical treatment to the group judged by doctors, at the possible expense of their lives. There is no perfect solution to this problem, but awareness is a good start.

Apart from conducting data science within the government, we can also consider publishing non-sensitive data, to put public service into the hands of the public. Open satellite imagery, for example, has enabled community involvement in humanitarian search efforts for missing Malaysian Airlines flight MH370, as well as detection of illegal forest fires in Indonesia. Pollutants from forest fires can be a regional health hazard, and boycotting culpable companies has been a way for the public to fight back. Crowdsourcing has emerged as a check and balance to ensure that corporations and government maintain social responsibility.

With more data available and data literacy improving, the potential for data science to improve the lives of citizens has never been greater. Whether we can successfully introduce data science in the public service will depend on how ready we are to tackle its accompanying challenges.


Thanks, Annalyn! We can’t wait to see what you get up to next.




Autonomous Vehicles and the Impact on Cities (Singularity University Global Summit)

Here’s a 20-minute talk I did at the Singularity University Global Summit last month. It’s a crash-course (no pun intended) on the different types of autonomous vehicles and use cases, the challenges that stand in the way of city-scale deployments, and ideas for how autonomous vehicles will transform cities, not just transportation systems.

Builds on ideas from these articles:

Policy Issues Facing Social Media Companies: The Case Study Of YouTube

One of the goals of is to bridge the worlds of Government, tech and business, which often hold a degree of suspicion towards each other. This article dives deep into controversial policy issues surrounding social media companies.

As a case study, it elucidates the challenges, considerations and dilemmas behind YouTube’s policies. This is me, a Government policy-maker, putting myself in the shoes of a YouTube policy-maker. I figure our considerations are similar despite our different contexts. If you know better than me on any of these issues, feedback is much, much welcomed.

The Unexpected Responsibilities of Social Media Companies

We live in an increasingly divided world. The forces driving these divisions, for example, rising income inequality, geopolitical, racial and religious tensions, were in play long before the advent of social media.

However, social media has provided a channel for divisions to widen. Lowering the barriers for individuals to share and ‘viral’ their knowledge and opinions has brought tremendous benefits, such as spreading education and freedom of speech. On the other hand, it has given greater voice and reach to malicious or ‘fake’ content. Algorithms designed to push us to what we will most likely click create an echo chamber, reinforcing our beliefs and biases.

When a flurry of social media companies took to the scene in the 2000s, their intention was to create platforms for people to find what they wanted – friends, funny videos, relevant information, roommates or hobbyist items. Very few would have imagined that their platforms would completely change how everyday folks conversed and debated, shared and consumed information.

Policy issues facing social media companies

Today, social media companies are adjusting to the new responsibilities that this influence entails. Here is an overview of the issues at stake.

  1. Free speech and censorship

It is important to recognize the role of social media in democratizing how information is generated, shared and consumed. At the same time, not everything is appropriate to be shared online. Social media platforms recognize that they must have a moral view on harmful content that should be taken down, for example, content which aims to instigate violence or harm to others.

However, censorship cannot be overused. Social media platforms cannot become arbiters of morality because many issues are subjective, and it is not the platform’s role to make a judgment on who is right: The same LGBT content can be affirming for some, but offensive for others. When is it fake news, or merely a different interpretation? Here’s a real dilemma: let’s say someone reports an outbreak of disease on Facebook. The Government requests to take down the report until their investigations are completed because it will incite unnecessary fear in their population. Is Facebook best placed to assess who is right?

In general, a social media platform’s policy must identify and take down of content that is inherently harmful, while catering to subjectivity by providing choice – to users, on the content they receive, and to advertisers, on the content their brands are associated with. It is an intricate balance to strike, requiring nuanced, consistent policy backed up by a strong and coherent detection, enforcement and appeals regime.

  1. Copyright infringements

Another policy area surrounds copyright. Individuals sharing content online may inadvertently or intentionally infringe on others’ copyrights. On one level, better detection of copyright infringements is needed. YouTube invested $60m in a system called ContentID, which allows rights holders to give YouTube their content so that YouTube can identify where it is being used.

What to do about copyright infringements is another issue. Should they be taken down immediately, or should the platform provide choice to copyright owners? Paradigms have shifted over the years in recognition that copyright owners may have different preferences: to enforce a take down, seek royalties or take no action.

  1. Privacy

A third category of policy issues surrounds managing users’ privacy rights.

First, how can the platform generate advertising revenues and keep their user base engaged, while respecting different preferences for personal privacy? This typically pertains to the practice of combining personal information with search and click history to build up a profile of the user, which enables targeted advertising. Information is sometimes sold to third parties.

Second, what does it mean to give people true ‘choice’ when it comes to privacy? Many argue that long privacy agreements which do not give people a choice other than quit the app do not provide people a real choice in privacy.

Third, should individuals have the right to be forgotten online? The EU and Google have been in a lengthy court battle on the right of private citizens to make requests for search engines to delist incorrect, irrelevant or out of date information returned by an online search for their full name, not just in their country of residence but globally.

  1. Children

Children bring these policy issues into sharper focus based on notions of age-appropriateness, consent, manipulation and safety. Platforms like Facebook do not allow users below 13. YouTube introduced ‘Restricted Mode’ as well as YouTube Kids, which filter content more strictly than the regular platform.

Similarly, higher standards apply to children’s privacy. Should companies be allowed to build profiles on children, and potentially manipulate them at such a young age? Should people be allowed to remove posts they made or online information about them while they were children?

Safety for children is also a huge issue particularly on interactive platforms where children can be groomed by predators. Taking into account privacy considerations, how can we detect it before harm is inflicted, and what is the right course of action?

The YouTube Case Study

I have not scraped the bottom of the barrel on the range of policy issues that social media companies deal with, but the broad categories are in place. Now let’s get into specifics of how social media companies have answered these questions through policy, implementation and resource allocation.

To put some meat on this, here’s a quick case study of YouTube’s approach. There are at least four components:

  1. Product differentiation
  2. Enhancing user choice within existing products
  3. Closing the policy-implementation loop
  4. Strategic communications and advocacy

1. Product differentiation

Product differentiation is one way to cater to different appetites for content and privacy. In 2015, YouTube has launched ‘YouTube Kids’ which excludes violence, nudity, and vulgar language. It also provides higher privacy by default through features such as blocking children from posting content and viewing targeted ads, and enabling them to view content without having to sign up for an account. ‘YouTube Red’ offers advertisement-free viewing.

However, product differentiation has its limits because significant resources are required for customization. There is also a slippery slope to avoid: if YouTube rolled out “YouTube China” with far stricter content censorship, imagine the influx of country requests that would ensue!

2. Enhancing user choices within existing products

Providing users choice in their settings is another way to cater to varying preferences within a given product. For example, advertisers on YouTube may have varying appetites for types of videos their advertisements are shown against. Enabling choice, rather than banning more videos, is key: earlier this year, YouTube introduced features that enabled advertisers to exclude specific sites and channels from all of their AdWords for Video and Google Display Network campaigns, and manage brand safety settings across all their campaigns with a push of a button.

Concerning privacy, users who do not want their personal data and search/click history to be linked can go to the activity controls section of their account page on Google, and untick the box marked “Include Chrome browsing history and activity from websites and apps that use Google services”. For particular searches, you can also use “incognito mode”, which ensures that Chrome will not save your browsing history, cookies and site data, or information entered in forms. These are ways to provide real choices in privacy.

3. Closing the Policy-Implementation Loop

A robust policy defines clear principles which determine when content should be taken down or excluded from monetization opportunities and Restricted Mode. Implementation policy then becomes critical. With the large volume of content coming online every minute, it is impossible for YouTube employees to monitor everything. YouTube has to rely on user flagging and machine learning to identify copyright infringements or offensive content.

However, algorithms cannot be 100% accurate and often cannot explain why decisions are made. A robust appeals and re-evaluation process with humans in the loop is needed to ensure the integrity of the policy. More importantly, the human touch is needed to positively engage content producers (who hate to be censored).

In my previous jobs, we often quipped: “policy is ops”. It is no point having a perfect policy if enforcement and implementation simply cannot support it. Policy teams need a constant feedback loop with implementation teams, to bridge the ideal with the possible.

4. Strategic communications and advocacy

Finally, robust policy is necessary, but insufficient for social media companies. Strategic communications and advocacy are an absolute must.

  • Public criticism of a company’s policies can negatively impact business. Boycotts and greater Government regulation are examples. YouTube is swimming against a common but simplistic narrative that tech companies are simply trading of public interests in privacy and security for business interests such as the growth of advertising revenue.
  • Misperceptions about policies can also have dangerous impacts. A few years ago, Israel’s Deputy Foreign Minister met with YouTube executives, raising the issue of Palestinians leveraging YouTube videos to incite violence against Israel. She later released a statement which inaccurately suggested that Google would collaborate with Israel to take down this content. Google refuted this, but the nuance could have already been lost with segments of the public. YouTube’s policy of neutrality must come across clearly, even as lobby groups try to drag it into their agendas.

The purpose of Strategic Communications is to create a wide circle of advocates around YouTube’s policy stance so that negative press and misperceptions are less likely to take off. Elements of Strategic Communications include:

  • Going beyond the ‘what’ of policy, to the ‘why’. It is important to illuminate the consistent principles behind YouTube’s policy stances, as well as the considerations and trade-offs entailed. Channels such as blog posts enable this, since mainstream media is unlikely to provide the level of nuance needed.
  • Building strategic relationships and advocates. This includes entering into conversations and debates with your most strident critics, and building alliances with third parties who advocate your views.
  • Strong internal communications. Since social media companies themselves are run by an aggregation of people with different beliefs, it is essential that employees do not feel disenfranchised by the company’s policy stance.
  • Providing an alternative narrative. In addition, an important point for YouTube to make is that more is at stake than taking down offensive video content. Ultimately, we are all fighting against greater divisiveness and polarization in society. Although some elements of YouTube exacerbate this, YouTube can also make a huge dent in bridging divides.  Hence, I love what YouTube is doing with “Creators for Change”, a program that cultivates creators who aim to counter xenophobia, extremism and hate online. These creators are working on web series on controversial issues, as well as educational workshops for students. They are using the YouTube platform to close divides.


It is far too simplistic to say that companies only pursue business interests, leaving Governments to protect public interests. Every new product, including social media platforms, is a double-edged sword, with the potential to bring us closer to or further from where we want to be as a society.

Both Governments and Social Media companies are trying to push us towards the first scenario. However, Governments will tend to advocate for more conservative policies as their primary objective is to minimize downside on issues such as national security, privacy and Government legitimacy. On the other hand, private businesses are simultaneously managing downsides while pushing the boundaries on issues such as free speech and revenue generation models.

A natural tension between these two positions is healthy as we decide, as countries and global communities, where we collectively fall on issues. This is how democracy works, after all.