The Google Cloud Vision API allows developers to easily integrate vision detection features within applications, including image labeling, face and landmark detection, optical character recognition (OCR), and tagging of explicit content.
In this codelab you will focus on using the Vision API with Python. You will learn how to use several of the API's features, namely label annotations, OCR/text extraction, landmark detection, and detecting facial features!
What you'll learn
- If you are looking for an interactive way to run your Python script, say you want to start a machine learning project with a couple of friends, look no further — Google Colab is the best solution for you. You can work online and save your code on your local Google Drive, and it allows you to. Run your scripts with free GPUs (and TPUs!).
- !python gdrive/My Drive/Colab Notebooks/objectdetectiondemo-master/test.py Additional Info: If you jusst want to run!python test.py then you should change directory, by the following command before it,%cd gdrive/My Drive/Colab Notebooks/objectdetectiondemo-master/.
This is the Jupyter notebook version of the Python Data Science Handbook by Jake VanderPlas; the content is available on GitHub.The text is released under the CC-BY-NC-ND license, and code is released under the MIT license.If you find this content useful, please consider supporting the work.
Google Colaboratory is a free Jupyter Notebook environment that does not require any settings and installation and entirely runs on Google’s cloud environment. With Google Collaboratory we can. Colab is a Python development environment that runs in the browser using Google Cloud. For example, to print 'Hello World', just hover the mouse over and press the play button to the upper.
- How to use Cloud Shell
- How to Enable the Google Cloud Vision API
- How to Authenticate API requests
- How to install the Vision API client library for Python
- How to perform Label detection
- How to perform Text detection
- How to perform Landmark detection
- How to perform Face detection
What you'll need
- A Google account (G Suite accounts may require administrator approval)
- A Google Cloud Platform project with an active GCP billing account
- Basic Python skills would be helpful but not required; this tutorial requires Python 2.6+. You can also use any supported language, but this tutorial is only additionally available in C#/.NET and Ruby.
How will you use this tutorial?
How would you rate your experience with Python?
How would you rate your experience with using Google Cloud services?
Self-paced environment setup
- Sign in to Cloud Console and create a new project or reuse an existing one. (If you don't already have a Gmail or G Suite account, you must create one.)
Note: You can easily access Cloud Console by memorizing its URL, which is console.cloud.google.com.
Remember the project ID, a unique name across all Google Cloud projects (the name above has already been taken and will not work for you, sorry!). It will be referred to later in this codelab as
Note: If you're using a Gmail account, you can leave the default location set to No organization. If you're using a G Suite account, then choose a location that makes sense for your organization.
- Next, you'll need to enable billing in Cloud Console in order to use Google Cloud resources.
Running through this codelab shouldn't cost much, if anything at all. Be sure to to follow any instructions in the 'Cleaning up' section which advises you how to shut down resources so you don't incur billing beyond this tutorial. New users of Google Cloud are eligible for the $300USD Free Trial program.
Start Cloud Shell
Activate Cloud Shell
- From the Cloud Console, click Activate Cloud Shell .
If you've never started Cloud Shell before, you'll be presented with an intermediate screen (below the fold) describing what it is. If that's the case, click Continue (and you won't ever see it again). Here's what that one-time screen looks like:
It should only take a few moments to provision and connect to Cloud Shell.
This virtual machine is loaded with all the development tools you'll need. It offers a persistent 5GB home directory and runs in Google Cloud, greatly enhancing network performance and authentication. Much, if not all, of your work in this codelab can be done with simply a browser or your Chromebook.
Once connected to Cloud Shell, you should see that you are already authenticated and that the project is already set to your project ID.
- Run the following command in Cloud Shell to confirm that you are authenticated:
gcloud command-line tool is the powerful and unified command-line tool in Google Cloud. It comes preinstalled in Cloud Shell. You will notice its support for tab completion. For more information, see gcloud command-line tool overview.
If it is not, you can set it with this command:
This codelab requires you to use the Python language (although many languages are supported by the Google APIs client libraries, so feel free to build something equivalent in your favorite development tool and simply use the Python as pseudocode). In particular, this codelab supports Python 2 and 3, but we recommend moving to 3.x as soon as possible.
Your place or ours?
Google provides a cloud-based development environment for user convenience, but developers are welcome to using their own local dev environment as well. If setting up your own, follow these guidelines to bring it in line with what's available from the Cloud Shell (more below). You can find out more about support for Python here.
The Cloud Shell is a convenience available for users directly from the Cloud Console and doesn't require a local development environment, so this tutorial can be done completely in the cloud with a web browser. The Cloud Shell is especially useful if you're developing or plan to continue developing with GCP products & APIs. More specifically for this codelab, the Cloud Shell has already pre-installed both versions of Python.
The Cloud Shell also has IPython installed... it is a higher-level interactive Python interpreter which we recommend, especially if you are part of the data science or machine learning community. If you are, IPython is the default interpreter for Jupyter Notebooks as well as Colab, Jupyter Notebooks hosted by Google Research.
IPython favors a Python 3 interpreter first but falls back to Python 2 if 3.x isn't available. IPython can be accessed from the Cloud Shell but can also be installed in a local development environment. Exit with ^D (Ctrl-d) and accept the offer to exit. Example output of starting
ipython will look like this:
If IPython isn't your preference, use of a standard Python interactive interpreter (either the Cloud Shell or your local development environment) is perfectly acceptable (also exit with ^D):
Terminal window output
- The output shown in the terminal/command windows (like those above) represent those on POSIX-compliant systems such as Linux or Mac OS X, including Cloud Shell.
- Most of the time, the shell prompt ($) is not shown unless it's to differentiate a command vs. its output.
- Please read this guide if you're using a Windows environment.
The codelab also assumes you have the
pip installation tool (Python package manager and dependency resolver). It comes bundled with versions 2.7.9+ or 3.4+. If you have an older Python version, see this guide for installation instructions. Depending on your permissions you may need to have
sudo or superuser access, but generally this isn't the case. You can also explicitly use
pip3 to execute
pip for specific Python versions.
The remainder of the codelab assumes you're using Python 3—specific instructions will be provided for Python 2 if they differ significantly from 3.x.
Use of the
pip tool as described above may require superuser access for package installation. If you're on a system where you don't, say a school computer, you may be need to use the
virtualenv virtual environment tool which gives you an entire Python environment within your home directory where you have full access to install packages.
*Create and use virtual environments
This section is optional and only really required for those who must use a virtual environment for this codelab (per the warning sidebar above). If you only have Python 3 on your computer, you can simply issue this command to create a virtualenv called
my_env (you can choose another name if desired):
However, if you have both Python 2 & 3 on your computer, we recommend you install a Python 3 virtualenv which you can do with the
-p flag like this:
Enter your newly created virtualenv by 'activating' it like this:
Confirm you're in the environment by observing your shell prompt is now preceded with your environment name, i.e.,
Now you should be able to
pip install any required packages, execute code within this eivonment, etc. Another benefit is that if you completely mess it up, get into a situation where your Python installation is corrupted, etc., you can blow away this entire environment without affecting the rest of your system.
Note: When you are ready to exit your virtual environment at the end of the codelab, enter the
deactivate command to return back to your operating system shell or the Cloud Shell.
Cloud Shell and virtual environments
When using Cloud Shell, you're not in a situation where a virtual environment is a necessity (as described above). The Cloud Shell contains all the default Python packages that are part of the Cloud SDK whereas a new virtual environment will not. You'd need to go through that installation process if you choose to use a virtual environment in Cloud Shell.
Before you can begin using Google APIs, you must enable them. The example below shows what you would do to enable the Cloud Vision API. In this codelab, you may be using one or more APIs, and should follow similar steps to enable them before usage.
From Cloud Shell
Using Cloud Shell, you can enable the API by using the following command:
Note: If this command ERRORs, check that the current Project ID matches your codelab Project ID.
Use the following command to find the current Project ID being used by Cloud Shell:
gcloud info grep 'project'
If the Project ID is not correct, use the following command to use the correct Project ID:
gcloud config set project <PROJECT_ID>
<PROJECT_ID> with the correct Project ID.
From the Cloud Console
You may also enable the Vision API in the API Manager. From the Cloud Console, go to API Manager and select, 'Library.'
In the search bar, start typing, 'vision,' then select Vision API when it appears. It may look something like this as you're typing:
Select the Cloud Vision API to get the dialog you see below, then click the 'Enable' button:
While many Google APIs can be used without fees, use of GCP (products & APIs) is not free. When enabling the Vision API (as described above), you may be asked for an active billing account. The Vision API's pricing information should be referenced by the user before enabling. Keep in mind that certain Google Cloud Platform (GCP) products feature an 'Always Free' tier for which you have to exceed in order to incur billing. For the purposes of the codelab, each call to the Vision API counts against that free tier, and so long as you stay within its limits in aggregate (within each month), you should not incur any charges.
Some Google APIs, i.e., G Suite, has usage covered by a monthly subscription, so there's no direct billing for use of the Gmail, Google Drive, Calendar, Docs, Sheets, and Slides APIs, for example. Different Google products are billed differently, so be sure to reference your API's documentation for that information.
In this codelab, you only need to turn on the Cloud Vision API, so proceed forward with this tutorial once you've successfully followed the instructions above and enabled the API.
In order to make requests to the APIs, your application needs to have the proper authorization. Authentication, a similar word, describes login credentials—you authenticate yourself when logging into your Google account with a login & password. Once authenticated, the next step is whether you are—or rather, your code, is—authorized to access data, such as blob files on Cloud Storage or a user's personal files on Google Drive.
Google APIs support several types of authorization, but the one most common for GCP API users is service account authorization since applications like the one in this codelab runs in the cloud as a 'robot user.' While the Vision API supports API key authorization as well, it's strongly recommended that users employ a more secure form of authorization.
A service account is an account that belong to your project or application (rather than a user) that is used by the client library to make Vision API requests. Like a user account, a service account is represented by an email address. You can create service account credentials from either the command line (via
gcloud) or in the Cloud Console. Let's take a look at both below.
Using gcloud (in Cloud Shell or your dev environment)
In this section, you will use the
gcloud tool to create a service account then create the credentials needed to access the API. First you will set an environment variable with your
PROJECT_ID which you will use throughout this codelab:
Next, you will create a new service account to access the Vision API by using:
Next, you will create the private key credentials that your Python code will use to log in as your new service account. Create these credentials and save it as JSON file
~/key.json by using the following command:
From the Cloud Console
To get OAuth2 credentials for user authorization, go back to the API manager (shortcut link: console.developers.google.com) and select the 'Credentials' tab on the left-nav:
From the Credentials page, click on the '+ Create Credentials' button at the top, which then gives you a pulldown dialog where you'd choose 'Service account:'
On the 'Create service account' screen (similar to the below), you must enter a Service account name (choose something short but explanatory like 'svc acct vision' or the one we used with
gcloud above, 'my vision sa'. A Service account ID is also required, and the form will create a valid ID string similar to the name you chose. The Service account description field is optional, but you can specify something like, 'Service account for Vision API demo'. Click the 'Create' button when complete.
The next step is to grant service account access to this project. Having a service account is great, but if it doesn't have permissions to access project resources, it's kind-of useless... it's like creating a new user who doesn't have any access.
Here, click on the 'Select a role' pulldown menu. You'll see a variety of options (see below), some more granular than others. For this codelab, choose Project → Viewer. Then click Continue.
On this 3rd screen (see below), we will skip granting specific users access to this service account, but we do need to make a private key our application script can use to access the Vision API with. To that end, click the '+ Create Key' button.
Creating a key is straightforward on the next screen. Take the default of a JSON key structure. (P12 is only used for backwards-compatibility, so it is not recommended for new projects.) Click the 'Create' button and save the private key file when prompted. The default filename will be long and possibly confusing, i.e.,
PROJECT_ID-HASH.json, so we recommend renaming it to something more digestible such as
Once the file is saved, you'll get the following confirmation message:
Click the 'Close' button to complete this task from the console.
One last step whether you created your service account from the command-line or in the Cloud console: direct your cloud project to use this as the default service account private key to use for your application by assigning this file to the
GOOGLE_APPLICATION_CREDENTIALS environment variable:
The environment variable should be set to the full path of the credentials JSON file you saved. It's not necessary to do so, but if you don't, you can only use that key file from the current working directory.
You can read more about authenticating the Google Cloud Vision API, including the other forms of authorization, i.e., API key, user authorization OAuth2 client ID, etc.
We're going to use the Vision API client library for Python which should already be installed in your Cloud Shell environment. Verify it's installed with with
If you're using a local development environment or using a new virtual environment you just created, install/update the client library (including
pip itself if necessary) with this command:
Confirm the client library can be imported without issue like the below, and then you're ready to use the Vision API from real code!
One of the Vision API's basic features is to identify objects or entities in an image, known as label annotation. Label detection identifies general objects, locations, activities, animal species, products, and more. The Vision API takes an input image and returns the most likely labels which apply to that image. It returns the top-matching labels along with a confidence score of a match to the image.
In this example, you will perform label detection on an image of a street scene in Shanghai. To do this, copy the following Python code into your IPython session (or drop it into a local file such as
label_detect.py and run it normally):
You should see the following output:
Note: If this Python code doesn't work for you (you get an authentication error), verify the instructions you performed during Authenticate API requests step.
Using the following command to verify the value of
GOOGLE_APPLICATION_CREDENTIALS environment variable:
It should output the expanded path to your file
key.json (or whatever name you chose to save it as). If it does, next check that a service account was created and is located at
~/key.json by using:
You should see something similar to:
If you don't, revisit the Authenticate API requests step above.
In this step, you were able to perform label detection on an image of a street scene in China and display the most likely labels associated with that image. Read more about Label Detection.
Text detection performs Optical Character Recognition (OCR). It detects and extracts text within an image with support for a broad range of languages. It also features automatic language identification.
In this example, you will perform text detection on an image of an Otter Crossing. Copy the following snippet into your IPython session (or save locally as
You should see the following output:
Google Colab Python 3.7
In this step, you were able to perform text detection on an image of an Otter Crossing and display the recognized text from the image. Read more about Text Detection.
Landmark detection detects popular natural and man-made structures within an image.
In this example, you will perform landmark detection on an image of the Eiffel Tower.
To perform landmark detection, copy the following Python code into your IPython session (or save locally as
You should see the following output:
In this step, you were able to perform landmark detection on an image of the Eiffel Tower. Read more about Landmark Detection.
Facial features detection detects multiple faces within an image along with the associated key facial attributes such as emotional state or wearing headwear.
In this example, you will detect the likelihood of emotional state from four different emotional likelihoods including: joy, anger, sorrow, and surprise.
To perform emotional face detection, copy the following Python code into your IPython session (or save locally as
You should see the following output for our face_surprise and face_no_surprise examples:
In this step, you were able to perform emotional face detection. Read more about Facial Features Detection.
Congratulations... you learned how to use the Vision API with Python to perform several image detection features! Also check out the code samples in this codelab's open source repo—while the code in this tutorial works for both 2.x (2.6+) and 3.x, the code in the repo requires 3.6+.
You're allowed to perform a fixed amount of (label, text/OCR, landmark, etc.) detection calls per month for free. Since you only incur charges each time you call the Vision API, there's no need to shut anything down nor must you disable/delete your project. More information on billing for the Vision API can be found on its pricing page.
In addition to the source code for the four examples you completed in this codelab, below are additional reading material as well as recommended exercises to augment your knowledge and use of the Vision API with Python.
- Cloud Vision API documentation: cloud.google.com/vision/docs
- Cloud Vision API home page & live demo: cloud.google.com/vision
- Vision API label detection/annotation: cloud.google.com/vision/docs/labels
- Vision API facial feature recognition: cloud.google.com/vision/docs/detecting-faces
- Vision API landmark detection: cloud.google.com/vision/docs/detecting-landmarks
- Vision API optical character recognition (OCR): cloud.google.com/vision/docs/ocr
- Vision API 'Safe Search': cloud.google.com/vision/docs/detecting-safe-search
- Vision API product/corporate logo detection: cloud.google.com/vision/docs/detecting-logos
- Python on Google Cloud Platform: cloud.google.com/python
- Google Cloud Python client: googlecloudplatform.github.io/google-cloud-python
- Codelab open source repo: github.com/googlecodelabs/cloud-vision-python
Now that you have some experience with the Vision API under your belt, below are some recommended exercises to further develop your skills:
- You've built separate scripts demoing individual features of the Vision API. Combine at least 2 of them into another script. For example, add OCR/text recognition to the first script that performs label detection (
label_detect.py). You may be surprised to find there is text on one of the hats in that image!
- Instead of our random images available on Google Cloud Storage, write a script that uses one or more of your images on your local filesystem. Another similar exercise is to find images online (accessible via http://).
- Same as #2, but with local images on your filesystem. Note that #2 may be an easier first step before doing this one with local files.
- Try non-photographs to see how the API works with those.
- Migrate some of the script functionality into a microservice hosted on Google Cloud Functions, or in a web app or mobile backend running on Google App Engine.
If you're ready to tackle that last suggestion but can't think of any ideas, here are a pair to get your gears going:
- Analyze multiple images in a Cloud Storage bucket, a Google Drive folder (use the Drive API), or a directory on your local computer. Call the Vision API on each image, writing out data about each into a Google Sheet (use the Sheets API) or Excel spreadsheet. (NOTE: you may have to do some extra auth work as G Suite assets like Drive folders and Sheets spreadsheets generally belong to users, not service accounts.)
- Some people Tweet images (phone screenshots) of other tweets where the text of the original can't be cut-n-pasted or otherwise analyzed. Use the Twitter API to retrieve the referring tweet, extract and pass the tweeted image to the Vision API to OCR the text out of those images, then call the Cloud Natural Language API to perform sentiment analysis (to determine whether it's positive or negative) and entity extraction (search for entities/proper nouns) on them. (This is optional for the text in the referring tweet.)
This work is licensed under a Creative Commons Attribution 2.0 Generic License.
The Earth Engine Python API can be deployed in a Google Colaboratory notebook.Colab notebooks are Jupyter notebooks that run in thecloud and are highly integratedwith Google Drive, making them easy to set up, access, and share. If you areunfamiliar with Google Colab or Jupyter notebooks, please spend some timeexploring the Colab welcome site.
The following sections describe deploying Earth Engine in Google Colab andvisualizing maps and charts using third‑party Python packages.Note: Installing the Earth Engine API and authenticating are necessary stepseach time you begin working with a Colab notebook. This guide demonstrates setupand testing with a new Colab notebook, but the process applies to shared andsaved notebooks as well. If you are not a registered Earth Engine user, pleasesign up.
Open a Colab notebook
Notebooks can be opened from either Google Drive or the Colaboratory interface.
Open Google Drive and create a new file.
- New > More > Colaboratory
- Right click in a folder and select More > Colaboratory from the contextmenu.
Visit the Colab siteand create a new file.
- File > New > New Python 3 notebook
- If you have interacted with Colab previously, visiting theabove linked site will provide you with a file explorer where youcan start a new file using the dropdown menu at the bottom of the window.
Existing notebook files (.ipynb) can be opened from Google Drive and the Colabinterface.
Colab notebooks can exist in various folders inGoogle Drive depending on where notebooks fileswere created. Notebooks created in Google Drive will exist in the folder theywere created or moved to. Notebooks created from the Colab interface willdefault to a folder called 'Colab Notebooks' which is automatically added tothe 'My Drive' folder of your Google Drive when you start working with Colab.
Colab files can be identified by a yellow 'CO' symbol and '.ipynb' fileextension. Open files by either doubling clicking on them and selectingOpen with > Colaboratory from the button found at the top of the resultingpage or by right clicking on a file and selecting Open with > Colaboratoryfrom the file's context menu.
Opening notebooks from theColab interface allows you to accessexisting files from Google Drive, GitHub, and local hardware. Visiting theColab interface after initial use will result in a file explorer modalappearing. From the tabs at the top of the file explorer, select a source andnavigate to the .ipynb file you wish to open. The file explorer can also beaccessed from the Colab interface by selecting File > Open notebook or usingthe Ctrl+O keyboard combination.
Import API and get credentials
This section demonstrates how to import the Earth Engine Python API andauthenticate access. This content is also available as a Colab notebook:
The Earth Engine API is included by default in Google Colaboratory so requiresonly importing and authenticating. These steps must be completed for each newColab session or if you restart your Colab kernel or if your Colab virtualmachine is recycled due to inactivity.
Import the API
Run the following cell to import the API into your session.
Authenticate and initialize
ee.Authenticate function to authenticate your access toEarth Engine servers and
ee.Initialize to initialize it. Add a codecell, enter the following lines, and run the cell.
You'll be asked to authorize access to your Earth Engine account. Follow theinstructions printed to the cell to complete this step.
Test the API
Test the API by printing the elevation of Mount Everest. Note that before usingthe API you must initialize it. Run the following Python script in a new cell.
ee.Image objects can be displayed to notebook output cells. The following twoexamples demonstrate displaying a static image and an interactive map.
IPython.display module contains the
Image function, which can displaythe results of a URL representing an image generated from a call to the EarthEngine
getThumbUrl function. The following script will display a thumbnailof a global elevation model.
foliumpackage can be used to display
ee.Image objects on an interactiveLeaflet map. Folium has no defaultmethod for handling tiles from Earth Engine, so one must be definedand added to the
folium.Map module before use.
The following script provides an example of adding a method for handing EarthEngine tiles and using it to display an elevation model to a Leaflet map.
Google Colab Python Version
Some Earth Engine functions produce tabular data that can be plotted bydata visualization packages such as
matplotlib. The following exampledemonstrates the display of tabular data from Earth Engine as a scatterplot. See Charting in Colaboratoryfor more information.