English | Site Directory

Uploading Data with Bulk Data Uploader

Gustavo Franco, Google Systems Administrator
April 2008

Introduction

With the bulk data uploader tool included in the SDK, Google App Engine makes it possible to upload your data from a CSV file and convert them into entities in your application's datastore. You can even specify that the entities be searchable.

In this article, we'll create a program that searches our datastore for entities based on a keyword. We will then show how to use the bulk data uploader to load searchable data onto the application.

Our Program: Searching the Datastore by Keyword

First, let's write a program that searches the datastore based on a keyword argument and returns all the entities that match the given keyword:

import wsgiref.handlers
from google.appengine.ext import webapp
from google.appengine.ext import search

class MainPage(webapp.RequestHandler):
  def get(self):
    # We use the webapp framework to retrieve the keyword
    keyword = self.request.get('keyword')

    self.response.headers['Content-Type'] = 'text/plain'
    if not keyword:
      self.response.out.write("No keyword has been set")
    else:
      # Search the 'Person' Entity based on our keyword
      query = search.SearchableQuery('Person')
      query.Search(keyword)
      for result in query.Run():
         self.response.out.write('%s' % result['email'])

def main():
  application = webapp.WSGIApplication([('/', MainPage)],
                                       debug=True)
  wsgiref.handlers.CGIHandler().run(application)

if __name__ == "__main__":
  main()

Great—however, our application doesn't yet contain any data! So let me now show you how to organize and upload some data into this application.

Bulk Data Uploader Overview

The bulk upload module is available in google.appengine.ext.bulkload and will allow you to import your external data into your application. It reads a CSV file, sends the data to your application, converts each line of data into an entity, and stores the entity in the datastore. You may also specify that the loader make entities searchable when it's uploading the data.

To use this module with our application, we create a loader that describes the data model. Then, we add the loader's handler to our app.yaml file. Finally, we use tools/bulkload_client.py to load the data into the datastore.

Creating the Bulk Loader for your Data Model

Let's create our bulk load handler in a file named myloader.py. The code below specifies for the bulk data loader the CSV format, as well as that entities are to be indexed.

This handler creates a new class that inherits from the Loader class and overrides the HandleEntity method. The new HandleEntity method passes entities through the indexer.

from google.appengine.ext import bulkload
from google.appengine.api import datastore_types
from google.appengine.ext import search

class PersonLoader(bulkload.Loader):
  def __init__(self):
    # Our 'Person' entity contains a name string and an email
    bulkload.Loader.__init__(self, 'Person',
                         [('name', str),
                          ('email', datastore_types.Email),
                          ])

  def HandleEntity(self, entity):
    ent = search.SearchableEntity(entity)
    return ent

if __name__ == '__main__':
  bulkload.main(PersonLoader())

Our CSV data for this entity model will look like this:

John, john@example.com
Mary, mary@example.com
William, william@example.com
...

Adding the Bulk Loader handler to your Project

Since our POST request endpoint will be at the /load URL, we must add the handler information for myloader.py to our app.yaml file:

- url: /load
  script: myloader.py
  login: admin

Uploading CSV data to your app

Now that we've set up our application and data loader, we can deploy the app using dev_appserver.py. Once the app is running, we're ready to send the CSV data to our app to have it loaded in the datastore.

The bulkload client is located in the SDK (tools/bulkload_client.py). This is the Python script that will read in a CSV file, then batch and send the data to your application. We can use the following command to load a CSV file (people.csv) onto a development server:

./bulkload_client.py --filename people.csv \
                     --kind Person \
                     --url http://localhost:8080/load

There you have it. Now, when we hit the URL http://localhost:8080/?keyword=William, we will see the following in our browser:

william@example.com