By Werner Vogels
Applications based on machine learning (ML) can provide tremendous business value. However, many developers find them difficult to build and deploy. As there are few individuals with this expertise, an easier process presents a significant opportunity for companies who want to accelerate their ML usage.
Though the AWS Cloud gives you access to the storage and processing power required for ML, the process for building, training, and deploying ML models has unique challenges that often block successful use of this powerful new technology.
The challenges begin with collecting, cleaning, and formatting training data. After the dataset is created, you must scale the processing to handle the data, which can often be a blocker. After this, there is often a long process of training that includes tuning the knobs and levers, called hyperparameters, that control the different aspects of the training algorithm. Finally, figuring out how to move the model into a scalable production environment can often be slow and inefficient for those that do not do it routinely.
At Amazon Web Services, we’ve committed to helping you unlock the value of your data through ML, through a set of supporting tools and resources that improve the ML model development experience. From the Deep Learning AMI and the distributed Deep Learning AWS CloudFormation template, to Gluon in Apache MXNet, we’ve focused on improvements that remove the roadblocks to development.
We also recently announced the Amazon ML Solutions Lab, which is a program to help you accelerate your use of ML in products and processes. As the adoption of these technologies continues to grow, customers have demanded a managed service for ML, to make it easier to get started.
Today, we are announcing the general availability of Amazon SageMaker. This new managed service enables data scientists and developers to quickly and easily build, train, and deploy ML models without getting mired in the challenges that slow this process down today.
Amazon SageMaker provides the following features:
· Hosted Jupyter notebooks that require no setup, so that you can start processing your training dataset and developing your algorithms immediately.
· One-click, on-demand distributed training that sets up and tears down the cluster after training.
· Built-in, high-performance ML algorithms, re-engineered for greater, speed, accuracy, and data-throughput.
· Built-in model tuning (hyperparameter optimization) that can automatically adjust hundreds of different combinations of algorithm parameters.
· An elastic, secure, and scalable environment to host your models, with one-click deployment.
In the hosted notebook environment, Amazon SageMaker takes care of establishing secure network connections in your VPC and launching an ML instance. This development workspace also comes pre-loaded with the necessary Python libraries and CUDA drivers, attaches an Amazon EBS volume to automatically persist notebook files, and installs TensorFlow, Apache MXNet, and Keras deep learning frameworks. Amazon SageMaker also includes common examples to help you get started quickly.
For training, you simply indicate the type and quantity of ML instances you need and initiate training with a single click. Amazon SageMaker then sets up the distributed compute cluster, installs the software, performs the training, and tears down the cluster when complete. You only pay for the resources that you use and never have to worry about the underlying infrastructure.
Amazon SageMaker also reduces the amount of time spent tuning models using built-in hyperparameter optimization. This technology automatically adjusts hundreds of different combinations of parameters, to quickly arrive at the best solution for your ML problem. With high-performance algorithms, distributed computing, managed infrastructure, and hyperparameter optimization, Amazon SageMaker drastically decreases the training time and overall cost of building production systems.
When you are ready to deploy, Amazon SageMaker offers an elastic, secure, and scalable environment to host your ML models, with one-click deployment. After training, Amazon SageMaker provides the model artifacts for deployment to EC2 or anywhere else. You then specify the type and number of ML instances. Amazon SageMaker takes care of launching the instances, deploying the model, and setting up the HTTPS endpoint for your application to achieve low latency / high throughput prediction.
In production, Amazon SageMaker manages the compute infrastructure to perform health checks, apply security patches, and conduct other routine maintenance, all with built-in Amazon CloudWatch monitoring and logging.
Before Amazon SageMaker, you were faced with a tradeoff between the flexibility to use different frameworks and the ease of use of a single platform. At AWS, we believe in giving choices, so Amazon SageMaker removes that problem. You can now use the tools of your choice, with a single environment for training and hosting ML models.
Amazon SageMaker provides a set of built-in algorithms for traditional ML. For deep learning, Amazon SageMaker provides you with the ability to submit MXNet or TensorFlow scripts, and use the distributed training environment to generate a deep learning model. If you use Apache Spark, you can use Amazon SageMaker’s library to leverage the advantages of Amazon SageMaker from a familiar environment. You can even bring your own algorithms and frameworks, in Docker containers, and use Amazon SageMaker to manage the training and hosting environments. Just like in Amazon RDS, where we support multiple engines like MySQL, PostgreSQL, and Aurora, we support multiple frameworks in Amazon SageMaker.
Finally, one of the best aspects of Amazon SageMaker is its modular architecture. You can use any combination of its building, training, and hosting capabilities to fit your workflow. For instance, you may use the build and training capabilities to prepare a production-ready ML model, and then deploy the model to a device on the edge, such as AWS DeepLens. Or, you may use only its hosting capabilities to simplify the deployment of models that you’ve already trained elsewhere. The flexibility of Amazon SageMaker’s architecture enables you to easily incorporate its benefits into your existing ML workflows in whatever combination is best.