MongoDB中聚合(aggregate)主要用于处理数据统计,例如:平均值、求和、最大值、最小值等等。这有点类似 SQL 语句中的 count(*)。
mongodb 中聚合的方法使用 aggregate(),该方法使用聚合管道执行聚合操作。该管道允许用户使用一系列基于阶段的操作来处理来自集合或其他源的数据。语法如下:
db.collection.aggregate(operators,[options],callback)
参数说明:
operators:指定聚合运算符的数组,它允许你定义对数据执行什么汇总操作。
options:允许你设置 readPreference 属性,它定义了从哪里读取数据。
callback:接受 err 和 res。
什么是管道运算符?
此处的管道和Linux中的管道类似,管道的作用是用于将当前命令的输出结果作为下一个命令的参数。MongoDB 的聚合管道将 MongoDB 文档在一个管道处理完毕后将结果传递给下一个管道处理,管道操作是可以重复的。
什么是表达式?
表达式用于处理输入文档并输出结果。表达式是无状态的,只能用于计算当前聚合管道的文档,不能处理其它的文档。
通过重命名、添加或删除字段重塑文档。你也可以重新计算值,并添加子文档。实例如下:
> db.users.find() { "_id" : ObjectId("5e4e90b991c7bc144476f813"), "id" : 1, "name" : "zhangsan", "age" : 27 } { "_id" : ObjectId("5e4e90c591c7bc144476f814"), "id" : 2, "name" : "tom", "age" : 24 } { "_id" : ObjectId("5e4e90cf91c7bc144476f815"), "id" : 3, "name" : "helen", "age" : 25 } { "_id" : ObjectId("5e4e90da91c7bc144476f816"), "id" : 4, "name" : "zhaoliu", "age" : 28 } # 重塑文档,该文档中只包含 _id、id 和 name 字段 > db.users.aggregate({ $project:{id:1, name:1}}); { "_id" : ObjectId("5e4e90b991c7bc144476f813"), "id" : 1, "name" : "zhangsan" } { "_id" : ObjectId("5e4e90c591c7bc144476f814"), "id" : 2, "name" : "tom" } { "_id" : ObjectId("5e4e90cf91c7bc144476f815"), "id" : 3, "name" : "helen" } { "_id" : ObjectId("5e4e90da91c7bc144476f816"), "id" : 4, "name" : "zhaoliu" } # 重塑文档,该文档中只包含 _id 和 name 字段 > db.users.aggregate({ $project:{name:1}}); { "_id" : ObjectId("5e4e90b991c7bc144476f813"), "name" : "zhangsan" } { "_id" : ObjectId("5e4e90c591c7bc144476f814"), "name" : "tom" } { "_id" : ObjectId("5e4e90cf91c7bc144476f815"), "name" : "helen" } { "_id" : ObjectId("5e4e90da91c7bc144476f816"), "name" : "zhaoliu" }
默认情况下 _id 字段默认包含,如果要隐藏 _id 字段,需要这样做:
> db.users.aggregate({ $project:{_id:0, name:1}}) { "name" : "zhangsan" } { "name" : "tom" } { "name" : "helen" } { "name" : "zhaoliu" }
$project 中也可以进行表达式计算,例如:使用 $add 表达式将 age 字段添加 10 岁,然后新建一个 newAge 字段。如下:
> db.users.aggregate({ $project:{name:1, newAge:{$add:["$age", 10]}}}); { "_id" : ObjectId("5e4e90b991c7bc144476f813"), "name" : "zhangsan", "newAge" : 37 } { "_id" : ObjectId("5e4e90c591c7bc144476f814"), "name" : "tom", "newAge" : 34 } { "_id" : ObjectId("5e4e90cf91c7bc144476f815"), "name" : "helen", "newAge" : 35 } { "_id" : ObjectId("5e4e90da91c7bc144476f816"), "name" : "zhaoliu", "newAge" : 38 }
用于过滤数据,只输出符合条件的文档。$match 使用 MongoDB 的标准查询操作。实例:过滤 users 集合中 age 小于等于 25 的文档,只输出大于 25 岁的文档。如下:
> db.users.find() { "_id" : ObjectId("5e4e90b991c7bc144476f813"), "id" : 1, "name" : "zhangsan", "age" : 27 } { "_id" : ObjectId("5e4e90c591c7bc144476f814"), "id" : 2, "name" : "tom", "age" : 24 } { "_id" : ObjectId("5e4e90cf91c7bc144476f815"), "id" : 3, "name" : "helen", "age" : 25 } { "_id" : ObjectId("5e4e90da91c7bc144476f816"), "id" : 4, "name" : "zhaoliu", "age" : 28 } # 过滤文档 > db.users.aggregate({ $match:{age:{$gt: 25}} }) { "_id" : ObjectId("5e4e90b991c7bc144476f813"), "id" : 1, "name" : "zhangsan", "age" : 27 } { "_id" : ObjectId("5e4e90da91c7bc144476f816"), "id" : 4, "name" : "zhaoliu", "age" : 28 }
$limit 用来限制MongoDB聚合管道返回的文档数。实例:限制管道返回文档数为2,如下:
> db.users.find() { "_id" : ObjectId("5e4e90b991c7bc144476f813"), "id" : 1, "name" : "zhangsan", "age" : 27 } { "_id" : ObjectId("5e4e90c591c7bc144476f814"), "id" : 2, "name" : "tom", "age" : 24 } { "_id" : ObjectId("5e4e90cf91c7bc144476f815"), "id" : 3, "name" : "helen", "age" : 25 } { "_id" : ObjectId("5e4e90da91c7bc144476f816"), "id" : 4, "name" : "zhaoliu", "age" : 28 } # 限制输出的文档数为2 > db.users.aggregate({ $limit:2 }) { "_id" : ObjectId("5e4e90b991c7bc144476f813"), "id" : 1, "name" : "zhangsan", "age" : 27 } { "_id" : ObjectId("5e4e90c591c7bc144476f814"), "id" : 2, "name" : "tom", "age" : 24 }
在聚合管道中跳过指定数量的文档,并返回余下的文档。实例:跳过前3个文档。如下:
> db.users.find() { "_id" : ObjectId("5e4e90b991c7bc144476f813"), "id" : 1, "name" : "zhangsan", "age" : 27 } { "_id" : ObjectId("5e4e90c591c7bc144476f814"), "id" : 2, "name" : "tom", "age" : 24 } { "_id" : ObjectId("5e4e90cf91c7bc144476f815"), "id" : 3, "name" : "helen", "age" : 25 } { "_id" : ObjectId("5e4e90da91c7bc144476f816"), "id" : 4, "name" : "zhaoliu", "age" : 28 } # 跳过前3个文档 > db.users.aggregate({$skip: 3}) { "_id" : ObjectId("5e4e90da91c7bc144476f816"), "id" : 4, "name" : "zhaoliu", "age" : 28 }
下面我们可以借助 $skip 和 $limit 实现分页操作。如下:
> db.users.find() { "_id" : ObjectId("5e4e90b991c7bc144476f813"), "id" : 1, "name" : "zhangsan", "age" : 27 } { "_id" : ObjectId("5e4e90c591c7bc144476f814"), "id" : 2, "name" : "tom", "age" : 24 } { "_id" : ObjectId("5e4e90cf91c7bc144476f815"), "id" : 3, "name" : "helen", "age" : 25 } { "_id" : ObjectId("5e4e90da91c7bc144476f816"), "id" : 4, "name" : "zhaoliu", "age" : 28 } # 从文档集合中获取第2、3个文档,实现分页 # 类似: select * from users limit 1,2 > db.users.aggregate([{$skip:1}, {$limit:2}]) { "_id" : ObjectId("5e4e90c591c7bc144476f814"), "id" : 2, "name" : "tom", "age" : 24 } { "_id" : ObjectId("5e4e90cf91c7bc144476f815"), "id" : 3, "name" : "helen", "age" : 25 }
在把文档传递给处理聚合操作的下一个管道前对它们排序。排序指定一个带有 field:<sort_order> 属性的对象,其中:<sort_order> 为1表示升序,而-1表示降序。
实例:实现对 age 字段进行升序、降序排序。如下:
> db.users.find() { "_id" : ObjectId("5e4e90b991c7bc144476f813"), "id" : 1, "name" : "zhangsan", "age" : 27 } { "_id" : ObjectId("5e4e90c591c7bc144476f814"), "id" : 2, "name" : "tom", "age" : 24 } { "_id" : ObjectId("5e4e90cf91c7bc144476f815"), "id" : 3, "name" : "helen", "age" : 25 } { "_id" : ObjectId("5e4e90da91c7bc144476f816"), "id" : 4, "name" : "zhaoliu", "age" : 28 } # 实现按 age 升序排序 > db.users.aggregate({$sort:{age:1}}) { "_id" : ObjectId("5e4e90c591c7bc144476f814"), "id" : 2, "name" : "tom", "age" : 24 } { "_id" : ObjectId("5e4e90cf91c7bc144476f815"), "id" : 3, "name" : "helen", "age" : 25 } { "_id" : ObjectId("5e4e90b991c7bc144476f813"), "id" : 1, "name" : "zhangsan", "age" : 27 } { "_id" : ObjectId("5e4e90da91c7bc144476f816"), "id" : 4, "name" : "zhaoliu", "age" : 28 } # 实现按 age 降序排序 > db.users.aggregate({$sort:{age:-1}}) { "_id" : ObjectId("5e4e90da91c7bc144476f816"), "id" : 4, "name" : "zhaoliu", "age" : 28 } { "_id" : ObjectId("5e4e90b991c7bc144476f813"), "id" : 1, "name" : "zhangsan", "age" : 27 } { "_id" : ObjectId("5e4e90cf91c7bc144476f815"), "id" : 3, "name" : "helen", "age" : 25 } { "_id" : ObjectId("5e4e90c591c7bc144476f814"), "id" : 2, "name" : "tom", "age" : 24 }
将文档中的某一个数组类型字段拆分成多条,每条包含数组中的一个值。实例:
> db.emails.find() { "_id" : ObjectId("5e51037f91c7bc144476f817"), "name" : "张三", "emails" : [ "zhangs@sina.com", "zs@gmail.com", "zhangsan@qq.com" ] } { "_id" : ObjectId("5e5103e991c7bc144476f818"), "name" : "李四", "emails" : [ "lisi@outlook.com", "lisi@sina.com.cn" ]} # 将emails数组字段拆分 > db.emails.aggregate({$unwind:"$emails"}) { "_id" : ObjectId("5e51037f91c7bc144476f817"), "name" : "张三", "emails" : "zhangs@sina.com" } { "_id" : ObjectId("5e51037f91c7bc144476f817"), "name" : "张三", "emails" : "zs@gmail.com" } { "_id" : ObjectId("5e51037f91c7bc144476f817"), "name" : "张三", "emails" : "zhangsan@qq.com" } { "_id" : ObjectId("5e5103e991c7bc144476f818"), "name" : "李四", "emails" : "lisi@outlook.com" } { "_id" : ObjectId("5e5103e991c7bc144476f818"), "name" : "李四", "emails" : "lisi@sina.com.cn" }
将集合中的文档分组,可用于统计结果。实例:求和。如下:
> db.users.find() { "_id" : ObjectId("5e4e90b991c7bc144476f813"), "id" : 1, "name" : "zhangsan", "age" : 27 } { "_id" : ObjectId("5e4e90c591c7bc144476f814"), "id" : 2, "name" : "tom", "age" : 24 } { "_id" : ObjectId("5e4e90cf91c7bc144476f815"), "id" : 3, "name" : "helen", "age" : 25 } { "_id" : ObjectId("5e4e90da91c7bc144476f816"), "id" : 4, "name" : "zhaoliu", "age" : 28 } # 分组,计算所有age的总和 > db.users.aggregate({$group:{_id:"*", totalAge:{$sum:"$age"}}}) { "_id" : "*", "totalAge" : 104 }
更多关于分组的实例如下:
# 准备数据 > db.books.find() { "_id" : ObjectId("5e510cf291c7bc144476f828"), "name" : "Java编程思想", "type" : "java", "price" : 78.5 } { "_id" : ObjectId("5e510cf291c7bc144476f829"), "name" : "Java设计模式", "type" : "java", "price" : 70 } { "_id" : ObjectId("5e510cf291c7bc144476f82a"), "name" : "Java核心编程", "type" : "java", "price" : 88 } { "_id" : ObjectId("5e510cf291c7bc144476f82b"), "name" : "Linux菜鸟教程", "type" : "linux", "price" : 53 } { "_id" : ObjectId("5e510cf491c7bc144476f82c"), "name" : "Linux命令手册", "type" : "linux", "price" : 48 }
$sum 计算总和。实例:
# 根据数据类型type进行分组,然后计算每个分组中书籍总价格 > db.books.aggregate({$group: {_id:"$type", total_price:{$sum:"$price"}}}) { "_id" : "linux", "total_price" : 101 } { "_id" : "java", "total_price" : 236.5 }
$avg 计算平均值。实例:
# 根据数据类型type进行分组,然后计算每个分组下price的平均值 > db.books.aggregate({$group:{_id:"$type", avg_price:{$avg:"$price"}}}) { "_id" : "linux", "avg_price" : 50.5 } { "_id" : "java", "avg_price" : 78.83333333333333 }
$min 获取集合中所有文档对应值得最小值。实例:
# 根据type进行分组,获取每个分组最低价格的书籍 > db.books.aggregate({$group:{ _id:"$type", min_price: {$min: "$price"}}}) { "_id" : "linux", "min_price" : 48 } { "_id" : "java", "min_price" : 70 }
$max 获取集合中所有文档对应值得最大值。实例:
# 根据type进行分组,获取每个分组最高价格的书籍 > db.books.aggregate({$group:{ _id:"$type", max_price: {$max: "$price"}}}) { "_id" : "linux", "max_price" : 53 } { "_id" : "java", "max_price" : 88 }
$push 在结果文档中插入值到一个数组中。 实例:
> db.books.find() { "_id" : ObjectId("5e510cf291c7bc144476f828"), "name" : "Java编程思想", "type" : "java", "price" : 78.5 } { "_id" : ObjectId("5e510cf291c7bc144476f829"), "name" : "Java设计模式", "type" : "java", "price" : 70 } { "_id" : ObjectId("5e510cf291c7bc144476f82a"), "name" : "Java核心编程", "type" : "java", "price" : 88 } { "_id" : ObjectId("5e510cf291c7bc144476f82b"), "name" : "Linux菜鸟教程", "type" : "linux", "price" : 53 } { "_id" : ObjectId("5e510cf491c7bc144476f82c"), "name" : "Linux命令手册", "type" : "linux", "price" : 48 } # 将分组中的 price 值插入到 prices 数组中 > db.books.aggregate({$group:{_id:"$type", prices:{$push:"$price"}}}) { "_id" : "linux", "prices" : [ 53, 48 ] } { "_id" : "java", "prices" : [ 78.5, 70, 88 ] }
$addToSet 在结果文档中插入值到一个数组中,但不创建副本。实例:
> db.books.find() { "_id" : ObjectId("5e510cf291c7bc144476f828"), "name" : "Java编程思想", "type" : "java", "price" : 78.5 } { "_id" : ObjectId("5e510cf291c7bc144476f829"), "name" : "Java设计模式", "type" : "java", "price" : 70 } { "_id" : ObjectId("5e510cf291c7bc144476f82a"), "name" : "Java核心编程", "type" : "java", "price" : 88 } { "_id" : ObjectId("5e510cf291c7bc144476f82b"), "name" : "Linux菜鸟教程", "type" : "linux", "price" : 53 } { "_id" : ObjectId("5e510cf491c7bc144476f82c"), "name" : "Linux命令手册", "type" : "linux", "price" : 48 } > db.books.aggregate({$group:{_id:"$type", prices:{$addToSet:"$price"}}}) { "_id" : "linux", "prices" : [ 48, 53 ] } { "_id" : "java", "prices" : [ 88, 70, 78.5 ] }
$first 根据资源文档的排序获取第一个文档数据。实例:
> db.books.find() { "_id" : ObjectId("5e510cf291c7bc144476f828"), "name" : "Java编程思想", "type" : "java", "price" : 78.5 } { "_id" : ObjectId("5e510cf291c7bc144476f829"), "name" : "Java设计模式", "type" : "java", "price" : 70 } { "_id" : ObjectId("5e510cf291c7bc144476f82a"), "name" : "Java核心编程", "type" : "java", "price" : 88 } { "_id" : ObjectId("5e510cf291c7bc144476f82b"), "name" : "Linux菜鸟教程", "type" : "linux", "price" : 53 } { "_id" : ObjectId("5e510cf491c7bc144476f82c"), "name" : "Linux命令手册", "type" : "linux", "price" : 48 } # 根据type分组,获取每个分组的第一个文档 > db.books.aggregate({$group:{_id:"$type", first_price:{$first:"$price"}}}) { "_id" : "linux", "first_price" : 53 } { "_id" : "java", "first_price" : 78.5 }
$last 根据资源文档的排序获取最后一个文档数据。如下:
> db.books.find() { "_id" : ObjectId("5e510cf291c7bc144476f828"), "name" : "Java编程思想", "type" : "java", "price" : 78.5 } { "_id" : ObjectId("5e510cf291c7bc144476f829"), "name" : "Java设计模式", "type" : "java", "price" : 70 } { "_id" : ObjectId("5e510cf291c7bc144476f82a"), "name" : "Java核心编程", "type" : "java", "price" : 88 } { "_id" : ObjectId("5e510cf291c7bc144476f82b"), "name" : "Linux菜鸟教程", "type" : "linux", "price" : 53 } { "_id" : ObjectId("5e510cf491c7bc144476f82c"), "name" : "Linux命令手册", "type" : "linux", "price" : 48 } # 根据type分组,获取每个分组最后一个文档 > db.books.aggregate({$group:{_id:"$type", last_price:{$last: "$price"}}}) { "_id" : "linux", "last_price" : 48 } { "_id" : "java", "last_price" : 88 }