Introduction

Voice assistants (VA) such as Amazon Alexa, Google Assistant and Apple Siri are rapidly gaining popularity in households and companies. Yet, we know little about whether the Amazon Alexa platform (which has a dominant market share) is trustworthy in terms of rejecting/ suspending policy-violating skills in practice. We seek to empirically assess the trustworthiness and to characterize security risks of the Amazon Alexa platform

In this work, we are curious to understand the extent to which Amazon Alexa (which has a dominant market) implements policy enforcement during the skill certification process to help developers improve the security of their skills, and prevent policy-violating skills from being published. Unfortunately, few research efforts have been undertaken to systematically address this critical problem. Existing work so far has mainly focused on exploiting the open voice/acoustic interfaces between users and speech recognition systems of VA devices.

Research questions

We seek to empirically assess the trustworthiness and to characterize security risks of the Amazon Alexa platform, and answer the following key questions:

(1) Is the skill certification process trustworthy in terms of detecting policy-violating third-party skills?

(2) What are the consequences of a lenient certification? Do policy-violating skills exist in the Alexa skills store?

(3) Once a dangerous skill passes the certification process and becomes available in the skills store, how can possible adversarial developers increase the chance for their skill to reach more end users?

(4) How does the Google Assistant’s certification system compare to that of Amazon Alexa?

Measurements

In order to understand how rigorous the skill certification process is for the Amazon Alexa platform, we performed a set of “adversarial” experiments against it. Our experimental findings reveal that the Alexa skills store has not strictly enforced policy requirements and leaves major security responsibilities to developers. We also performed a comparative study with the Google Assistant platform, including a measurement of Google actions’ certification system, and a comparative analysis of their policy requirements. In addition, we conducted a dynamic testing of 825 skills under the kids category to identify existing risky skills. We examined how different factors may affect the outcome of skill discovery if there are conflicts/ambiguities with names for skill invocation. We conducted a user study with 78 participants to un- derstand the usage habits of people, when exploring new skills, when encountering anything inappropriate, and their trust in VA platforms.

Findings

Our study leads to one overall conclusion: Alexa’s certification process is not implemented in a proper and effective manner, despite claims to the contrary. The lack of trustworthiness of Amazon Alexa platform poses challenges to its long-term success.

(1) We are the first to systematically characterize security threats of Amazon Alexa’s certification system. We crafted 234 policy- violating skills that intentionally violate Alexa’s policy requirements and submitted them for certification. We were able to get all of them certified. We encountered many improper and disorganized cases. We provide new insights into real-world security threats from the Amazon Alexa platform due to its insufficient trustworthiness and design flaws

(2) We examined 2,085 negative reviews from skills under the kids category, and characterized common issues reported by users. Through dynamic testing of 825 skills, we identified 52 problematic skills with policy violations and 51 broken skills under the kids category.

(3) We empirically tested Alexa’s skill discovery process, and revealed that an adversary may manipulate the skill discovery mechanism to hijack benign skills (given Amazon’s fraud detection mechanism is not perfect). This combined with the lenient certification process puts daily VA users at a high risk.

(4) We performed a comparative measurement of Google Assistant’s certification system. We submitted 273 policy-violating actions in total out of which 116 actions were certified and 157 failed to pass the certification. Our measurement on Google Assistant platform shows that it also has potentially exploitable flaws.

Continue reading..

Read our other sections

Experiment Setup

We performed “adversarial” experiments against the skill certification process of the Amazon Alexa platform. For testing the trustworthiness, we craft 132 policy-violating skills that intentionally violate specific policies defined by Amazon, and examine if it gets certified and published to the store or not.

Learn More

Experiment Results

Our results showed strong evidence that Alexa's skill certification process is implemented in a disorganized manner. We were able to publish all 132 skills that we submitted although some of them required a resubmission.

Learn More

Google Assistant

We conducted a few experiments on Google Assistant platform as well. While Google does do a better job in the certification process based on our preliminary measurement, it is still not perfect and it does have potentially exploitable flaws that need to be tested more in the future.

Learn More