In Couchbase, indexes are created through Map-Reduce Indexers. When you want to search on individual fields within JSON documents, you need to write a View. A View lives within a Design Document, and Couchbase allocates resources at the Design Document level. Design Documents or Indexers, can contain many Views or Indexes. When a Design Document is triggered to update, either automatically, or programmatically, since they are Indexers, all Indexes (Views) are updated at the same time in a batch.


Terminology: Design Documents are Indexers, which contain one or many Views which are Indexes to be queried

The Map Function

View Anatomy - Map Function


Map Functions are written in Javascript, and they specify how to process the Documents that pass through the Indexers. The parameters of the map function are the document data and the document metadata. Document metadata includes the document key and the expiry as well as the document type (json or base64). Within the map function the emit() function is the actual Index creation. The emit() function takes two parameters, the index-key and the index-value or output value. Essentially you can think of emit() as creating a row, although it is stored as an append-only b-superstar-tree. During indexing the tree gets rebalanced/rotated.



View Anatomy - Map Function



Understanding what an index is helps to use it properly. When you index a JSON field, or fields, you are creating an ordered set of rows based on the values of that JSON key. Let's look at an example to help understand what the Index looks like from the application and application programmer's point of view.



Simple Example #1

Let's look at an example set of JSON Documents, we can imagine that there are thousands or millions of these.


key: user::2a92jd02828

{
  doctype: "user",
  name: "Aldon Smith",
  username: "rudeboy",
  email: "asmith@email.com",
  last_login: 1360003359
}

key: user::828c201abf

{
  doctype: "user",
  name: "Byron Smith",
  username: "metallicafan",
  email: "bsmith@email.com",
  last_login: 1360002838
}

key: user::3f28f2929d

{
  doctype: "user",
  name: "Calvin Smith",
  username: "bono2830",
  email: "csmith@email.com",
  last_login: 1360001292
}


Since the documents are keyed without the email address (hybrid-random in this case). Here is a Map function that will index the email field to be able to query and retrieve all email addresses to send a newsletter out to.


Map Function (by_email)


function(doc, meta) {

  // Ensure we are processing only user documents and that the index field exists, other documents are ignored
  if (doc.doctype == "user" && doc.email) {

    // Index the email field, no need for output
    emit(doc.email)
    
  }
  
}

Every document passes through the Map function. The first if statement ensures that we are processing the documents that we want to process and index, in this case User documents indicated by the doc.doctype JSON field. We also ensure the field we are indexing exists in the document (as with Schema flexibility it quite possibly may not exist, or not exist yet). The result of the map function produces an ordered index on email addresses that can be queried:


# Index-Key Output Value Document Key
1 "asmith@email.com" null user::2a92jd02828
2 "bsmith@email.com" null user::828c201abf
3 "csmith@email.com" null user::3f28f2929d

If you want to understand clearly how Indexes are ordered, they follow the default Unicode Collation methodology rather than byte-order, and you can learn more about it from my blog post: Understanding Letter Ordering in View Queries.

As you can see the index is ordered and now these index-keys can be queried through a number of different methods: range query, match query, set-match query.


Simple Example #2

What if we want to look up email addresses by usernames? We create another View with a different map function to index the username field, but in this case we can actually output the email address as the output value since it won't change much and it's a single field.


Map Function (emails_by_username)


function(doc, meta) {

  // Ensure we are processing only user documents and that the index field exists, other documents are ignored
  if (doc.doctype == "user" && doc.username) {

    // Index the username field, output the email
    emit(doc.username, doc.email)
    
  }
  
}

In this case, the Index is ordered by the username field, and the email is the output which when we query the view is now available in the query results:


# Index-Key Output Value Document Key
1 "bono2830" "csmith@email.com" user::3f28f2929d
2 "metallicafan" "bsmith@email.com" user::828c201abf
3 "rudeboy" "asmith@email.com" user::2a92jd02828

What if we have a user document that doesn't have an email address (like a ghost user, or email wasn't required on signup)? We can see that by the output value being null in this case. In our map function we only had checked the existence of the username field.


key: user::52a8289df

{
  doctype: "user",
  name: "Dexter Smith",
  username: "thedex",
  last_login: 1360002939
}

# Index-Key Output Value Document Key
1 "bono2830" "csmith@email.com" user::3f28f2929d
2 "metallicafan" "bsmith@email.com" user::828c201abf
3 "rudeboy" "asmith@email.com" user::2a92jd02828
3 "thedex" null user::52a8289df

In this case, we know which users didn't supply an email address. Of course, we could create a map function specifically for that, and only index users that didn't supply an email address.


Map Function (by_username_no_email)


function(doc, meta) {

  // Ensure we are processing only user documents and that the index field exists, and that email address doesn't exist
  if (doc.doctype == "user" && doc.username && !doc.email ) {

    // Index the username field which match the if conditions
    emit(doc.username, null)
    
  }
  
}

If we wanted to index the last login, we could output the unix timestamp as the index key, or we can convert it into an array using the dateToArray() function. See the Map-Reduce Examples for more details of that.


Many Uses for Map-Reduce Views

Index Particular Fields within JSON Documents
By using Views you can index on particular fields within JSON documents. This is one of the highlights of Couchbase 2.0 and a very common use case.
Creating Simulated Joins
In the Map-Reduce Examples you can see examples how how to join document sets together into a single Index that combines both. While not exactly the same as a SQL Join, it can be very useful for a number of use cases.



Q&A and Comments



 

Q&A and Comments

comments powered by Disqus