Django之select_related和prefetch_related
在Django中,使用select_related和prefetch_related是两个很常见的优化手段。
举个例子最能说明问题。
准备工作
首先建立如下model1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22class Category(models.Model):
name = models.CharField(max_length=30)
creat_time = models.DateTimeField(auto_now_add=True)
def __unicode__(self):
return u'%s' % self.name
class Tag(models.Model):
name = models.CharField(max_length=30)
creat_time = models.DateTimeField(auto_now_add=True)
def __unicode__(self):
return u'%s' % self.name
class Post(models.Model):
title = models.CharField(max_length=255)
slug = models.SlugField(max_length=300, allow_unicode=True, unique=True)
content = models.TextField()
publish_time = models.DateTimeField(auto_now_add=True)
category = models.ForeignKey(Category)
tag = models.ManyToManyField(Tag, blank=True)
def __unicode__(self):
return u'%s' % self.title
之后编写序列化1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17class CategorySerializer(serializers.ModelSerializer):
class Meta:
model = Category
class TagSerializer(serializers.ModelSerializer):
class Meta:
model = Tag
class PostSerializer(serializers.ModelSerializer):
category = CategorySerializer()
tag = TagSerializer(many=True)
class Meta:
model = Post
fields = ('id', 'title', 'slug', 'content', 'publish_time', 'category', 'tag', )
之后编写api1
2
3
4
5
6
7
8
9
class PostListAPI(generics.ListAPIView):
serializer_class = PostSerializer
model = Post
paginate_by = 10
def get_queryset(self):
return Post.objects.all().order_by('-publish_time')
编写url1
url(r'^api/posts/?$', PostListAPI.as_view(), name='post_list'),
之后在后台新建两篇文章,访问api, 可以看到访问的sql1
2
3
4
5DEBUG [24/Jul/2016 13:57:13] [django.db.backends:utils:execute:89] [None] (0.000) SELECT "blog_post"."id", "blog_post"."title", "blog_post"."slug", "blog_post"."content", "blog_post"."publish_time", "blog_post"."category_id" FROM "blog_post" ORDER BY "blog_post"."publish_time" DESC; args=()
DEBUG [24/Jul/2016 13:57:13] [django.db.backends:utils:execute:89] [None] (0.000) SELECT "blog_category"."id", "blog_category"."name", "blog_category"."creat_time" FROM "blog_category" WHERE "blog_category"."id" = 1; args=(1,)
DEBUG [24/Jul/2016 13:57:13] [django.db.backends:utils:execute:89] [None] (0.000) SELECT "blog_tag"."id", "blog_tag"."name", "blog_tag"."creat_time" FROM "blog_tag" INNER JOIN "blog_post_tag" ON ("blog_tag"."id" = "blog_post_tag"."tag_id") WHERE "blog_post_tag"."post_id" = 2; args=(2,)
DEBUG [24/Jul/2016 13:57:13] [django.db.backends:utils:execute:89] [None] (0.000) SELECT "blog_category"."id", "blog_category"."name", "blog_category"."creat_time" FROM "blog_category" WHERE "blog_category"."id" = 1; args=(1,)
DEBUG [24/Jul/2016 13:57:13] [django.db.backends:utils:execute:89] [None] (0.000) SELECT "blog_tag"."id", "blog_tag"."name", "blog_tag"."creat_time" FROM "blog_tag" INNER JOIN "blog_post_tag" ON ("blog_tag"."id" = "blog_post_tag"."tag_id") WHERE "blog_post_tag"."post_id" = 1; args=(1,)
这里有两篇文章,访问tag和category表都分别访问了两次,如果是10篇文章,那访问tag和category则分别要10次。此时select_related和prefetch_related派上了用场。
select_related
查看select_related的文档,在返回QuerySet时,对于ForeignKey和OneToOneField等字段,通过添加select_related,可以把相关的对象在一次查询中查出,之后使用时就不需要再次查数据库。
还是看例子容易明白。对于上面的Post中category字段,因为是ForeignKey, 所以可以通过select_related查出,修改api如下1
2
3
4
5
6
7
8
9
10
11
class PostListAPI(generics.ListAPIView):
serializer_class = PostSerializer
model = Post
paginate_by = 10
def get_queryset(self):
queryset = Post.objects.all().order_by('-publish_time')
queryset = queryset.select_related('category')
return queryset
访问url, 查看sql1
2
3DEBUG [24/Jul/2016 13:58:29] [django.db.backends:utils:execute:89] [None] (0.000) SELECT "blog_post"."id", "blog_post"."title", "blog_post"."slug", "blog_post"."content", "blog_post"."publish_time", "blog_post"."category_id", "blog_category"."id", "blog_category"."name", "blog_category"."creat_time" FROM "blog_post" INNER JOIN "blog_category" ON ("blog_post"."category_id" = "blog_category"."id") ORDER BY "blog_post"."publish_time" DESC; args=()
DEBUG [24/Jul/2016 13:58:29] [django.db.backends:utils:execute:89] [None] (0.000) SELECT "blog_tag"."id", "blog_tag"."name", "blog_tag"."creat_time" FROM "blog_tag" INNER JOIN "blog_post_tag" ON ("blog_tag"."id" = "blog_post_tag"."tag_id") WHERE "blog_post_tag"."post_id" = 2; args=(2,)
DEBUG [24/Jul/2016 13:58:29] [django.db.backends:utils:execute:89] [None] (0.000) SELECT "blog_tag"."id", "blog_tag"."name", "blog_tag"."creat_time" FROM "blog_tag" INNER JOIN "blog_post_tag" ON ("blog_tag"."id" = "blog_post_tag"."tag_id") WHERE "blog_post_tag"."post_id" = 1; args=(1,)
可以看到,对于category信息,在查询post的同时,也一起查出,减少了查询category表的操作。
prefetch_related
查看Django的prefetch_related文档,了解到prefetch_related对于相关对象会进行一次独立的查询,然后在Python中把对象关联起来。所以prefetch_related可以用于many-to-many and many-to-one关系。
还是举例子,对于上面的tag, 可以使用prefetch_related来处理。修改api如下:1
2
3
4
5
6
7
8
9
10
11class PostListAPI(generics.ListAPIView):
serializer_class = PostSerializer
model = Post
paginate_by = 10
def get_queryset(self):
queryset = Post.objects.all().order_by('-publish_time')
queryset = queryset.select_related('category')
queryset = queryset.prefetch_related('tag')
return queryset
之后访问api,查看sql1
2DEBUG [24/Jul/2016 14:02:35] [django.db.backends:utils:execute:89] [None] (0.000) SELECT "blog_post"."id", "blog_post"."title", "blog_post"."slug", "blog_post"."content", "blog_post"."publish_time", "blog_post"."category_id", "blog_category"."id", "blog_category"."name", "blog_category"."creat_time" FROM "blog_post" INNER JOIN "blog_category" ON ("blog_post"."category_id" = "blog_category"."id") ORDER BY "blog_post"."publish_time" DESC; args=()
DEBUG [24/Jul/2016 14:02:35] [django.db.backends:utils:execute:89] [None] (0.000) SELECT ("blog_post_tag"."post_id") AS "_prefetch_related_val_post_id", "blog_tag"."id", "blog_tag"."name", "blog_tag"."creat_time" FROM "blog_tag" INNER JOIN "blog_post_tag" ON ("blog_tag"."id" = "blog_post_tag"."tag_id") WHERE "blog_post_tag"."post_id" IN (2, 1); args=(2, 1)
可以看到对于tag的查询也只有一次。
从上面的例子可以看到,掌握select_related和prefetch_related的用法非常重要。本文的代码见test-django的blog
参看资料: