Reduce Example #1

As part of the View is a Reduction that can also take place. When you map() documents and index them, you can group Index-Keys together and produce aggregate information as a "Reduction". This allows for the creation of statistics on a collection of documents and other creative uses of reduce. Continuing with our data set from Anatomy of View let's add another document data point, and a reduction to see how it works.


Document Sample


key: user::2a92jd02828

{
  doctype: "user",
  name: "Aldon Smith",
  username: "rudeboy",
  email: "asmith@email.com",
  points: 1000,
  last_login: 1360003359
}

key: user::828c201abf

{
  doctype: "user",
  name: "Byron Smith",
  username: "metallicafan",
  email: "bsmith@email.com",
  points: 2000,
  last_login: 1360002838
}

key: user::3f28f2929d

{
  doctype: "user",
  name: "Calvin Smith",
  username: "bono2830",
  email: "csmith@email.com",
  points: 3000,
  last_login: 1360001292
}

key: user::52a8289df

{
  doctype: "user",
  name: "Dexter Smith",
  username: "thedex",
  points: 0,
  last_login: 1360002939
}


Map Function (by_username)


function(doc, meta) {

  // Ensure we are processing only user documents and that the index field exists, other documents are ignored
  if (doc.doctype == "user" && doc.username) {

    // Index the username field, output the email
    emit(doc.username, doc.email)
    
  }
  
}

Reduce Function (built-in _count)


_count

In this scenario we are also reducing the results of the map function to get a count of the number of Index-Keys in the Index. We are querying with reduce=true in the query parameters:


Key Value Document Key
null 4 undefined

Notice when we do the reduction, we are collapsing rows in our index and there is no associated row key nor document key for the result (since there are many, potentially millions of course if you have a large dataset).


Reduce Example #2

What if we wanted to see all users ordered by the number of points they have, and also see stats on the highest, lowest, and average points?


Map Function (points_by_username)


function(doc, meta) {

  // Ensure we are processing only user documents and that the index field exists, other documents are ignored
  if (doc.doctype == "user" && doc.points) {

    // Index the points field and output the username
    emit(doc.username, doc.points)
    
  }
  
}

Results without Reduction


# Index-Key Output Value Document Key
1 "bono2830" 3000 user::3f28f2929d
2 "metallicafan" 2000 user::828c201abf
3 "rudeboy" 1000 user::2a92jd02828

If we query this View with reduce=false you only look at results of the map() function. In this case, user "thedex" is not in the index with points: 0 because in the if statement points == 0, and 0, false and null are logical false in javascript.


Notice that the order is backwards as far as points. If we want to create a leaderboard of highest points, we have some things to consider. First the _stats and _sum reduce functions can only work on numbers. Let's first see the result and then we can write another View that functions as a simple leaderboard, including people with 0 points.


_stats and _sum built-in reduce functions can only work output values that are numerical, you will receive an error if you try to use them on strings
_count built-in reduce function can be used with any output values

Reduce Function (built-in _stats)


_stats

Key Value Document Key
null {"sum":6000, "count":3, "min":1000, "max":3000, "sumsqr":14000000} undefined

From the summary data you can calculate average (2000 in this case) by using the sum and count, and you can also see the min/max. The sum of squares can be used for statistical calculations.


Creating a Leaderboard

Due to the use of the _stats reduce above, we have to use the points value in the output of the emit() function. If we want a Leaderboard however we need to index the points themselves. In this case we have two options, we can index the points and output the points, which allows for the _stats reduction when we want summary info. We then have to use a get() on the row key to retrieve the document information (username).


Map Function (by_points)


function(doc, meta) {

  // Ensure we are processing only user documents and that the index field exists, other documents are ignored
  if (doc.doctype == "user" && doc.points >= 0 ) {

    // Index the points field and output the username
    emit(doc.points, doc.points)
    
  }
  
}

Results without Reduction


# Index-Key Output Value Document Key
1 0 0 user::52a8289df
2 1000 1000 user::2a92jd02828
3 2000 2000 user::828c201abf
4 3000 3000 user::3f28f2929d

Map Function (usernames_by_points)


function(doc, meta) {

  // Ensure we are processing only user documents and that the index field exists, other documents are ignored
  if (doc.doctype == "user" && doc.points >= 0 ) {

    // Index the points field and output the username
    emit(doc.points, doc.username)
    
  }
  
}

Results but can only use _count reduction


# Index-Key Output Value Document Key
1 0 "thedex" user::52a8289df
2 1000 "rudeboy" user::2a92jd02828
3 2000 "metallicafan" user::828c201abf
4 3000 "bono2830" user::3f28f2929d

There are many different ways to use reductions, and you can write your own custom reducers, however, be aware they can increase latency if you don't use the built-in one's as it requires more CPU to compute custom reduces. See the Map-Reduce Examples for more details of additional reduces and a sample custom reducer.


Many Uses for Reducing Views

Create simple counts
Simple aggregate counts are easy to do, and they can use existing map() functions with a reduce=true in the query parameters
Additional Reduce statistical aggregates
Using _stats you can gather even more aggregate data that is useful for having sums, averages, min/max, and sum of squares



Q&A and Comments



 

Q&A and Comments

comments powered by Disqus