CRUD multiple records in Django models
In this section you'll learn how to work with multiple records in Django models. Although the process is just as easy as working with single model records, working with multiple records can require multiple database calls, as well as caching techniques and bulk operations, all of which need to be taken into account to minimize execution times.
Create multiple records with bulk_create()
To create multiple records based
on a Django model you can use the built-in
bulk_create()
method. The advantage of the
bulk_create()
method is that it creates all entries in
a single query, so it's very efficient if you have a list of a
dozen or a hundred entries you wish to create. Listing 8-12
illustrates the process to create multiple records for the
Store
model.
Listing 8-12 Create multiple records of a Django model with the bulk_create() method
# Import Django model class from coffeehouse.stores.models import Store # Create model Store instances store_corporate = Store(name='Corporate',address='624 Broadway', city ='San Diego',state='CA',email='corporate@coffeehouse.com') store_downtown = Store(name='Downtown',address='Horton Plaza', city ='San Diego',state='CA',email='downtown@coffeehouse.com') store_uptown = Store(name='Uptown',address='240 University Ave', city ='San Diego',state='CA',email='uptown@coffeehouse.com') store_midtown = Store(name='Midtown',address='784 W Washington St', city ='San Diego',state='CA',email='midtown@coffeehouse.com') # Create store list store_list = [store_corporate,store_downtown,store_uptown,store_midtown] # Call bulk_create to create records in a single call Store.objects.bulk_create(store_list)
In listing 8-12 you can see the
bulk_create()
method accepts a list of model instances
to create all records in one step. But as efficient as the
bulk_create()
method is, you should be aware it has
certain limitations:
- It does not support pre-save and post-save model signals.- To
speed things up and unlike the
save()
method to create single records, thebulk_create()
method does not execute pre-save and post-save model signals. In you're unfamiliar with the model signal concept, pre-save and post-save model signals allow the execution of custom logic prior and after a model record is saved, a topic covered in the previous chapter. - It does not support models that span multiple tables (i.e. have relationships
among one another).- Because records are created in bulk, there is
no way to obtain primary key references for the first type of
created records, which are then used to create related child
records. If a model spans multiple tables, then you must
individually create each record using the
save()
method which does support creating records that span multiple tables.
If you face these limitations for
the bulk_create()
method, the only alternative is to
loop over each record and use the save()
method to
create each entry, as illustrated in listing 8-13.
Listing 8-13 Create multiple records with the save() method
# Same store_list as listing 8-12 # Loop over each store and invoke save() on each entry # save() method called on each list member to create record for store in store_list: store.save()
As I mentioned when I introduced
the bulk_create()
method, the process in listing 8-13
can be highly inefficient if its done for dozens or hundreds of
records, but sometimes it's the only option to create multiple
records in bulk. However, the speed issues related to listing 8-13
can be improved if you manually deal with model transactions.
Listing 8-14 illustrates how to
use the save()
method and group the entire record
creation process in a single transaction to speed up the bulk
creation process.
Listing 8-14 Create multiple records with save() method in a single transaction
# Import Django model and transaction class from coffeehouse.stores.models import Store from django.db import transaction # Create store list, with same references from listing 8-12 first_store_list = [store_corporate,store_downtown] second_store_list = [store_uptown,store_midtown] # Trigger atomic transaction so loop is executed in a single transaction with transaction.atomic(): # Loop over each store and invoke save() on each entry for store in first_store_list: # save() method called on each member to create record store.save() # Method decorated with @transaction.atomic to ensure logic is executed in single transaction @transaction.atomic def bulk_store_creator(store_list): # Loop over each store and invoke save() on each entry for store in store_list: # save() method called on each member to create record store.save() # Call bulk_store_creator with Store list bulk_store_creator(second_store_list)
As you can see in listing 8-14,
there are two ways to create bulk operations in a single database
transaction, both using the django.db.transaction
package. The first instance uses the with
transaction.atomic():
statement, so any nested code within
this statement is run in a single transaction. The second instance
uses the @transaction.atomic
method decorator, which
ensures the method operations are run in a single transaction.
There's a reason Django's default database transaction mechanism creates transactions on every query, it's to err on the safe side and minimize the potential for data loss.
If you decide to use explicit transactions to improve performance -- as illustrated in listing 8-14 -- be aware that either all or no records are created. Although this can be a desired behavior, for certain circumstances it might lead to unexpected results. Make sure you understand the implications of transactions on the data you're working with. The previous chapter contains a section discussing the topic of Django model transactions in greater detail.
Read multiple records with all(), filter(), exclude() or in_bulk()
To read multiple records
associated with a Django model you can use several methods, which
include: all()
, filter()
,
exclude()
and in_bulk()
. The purpose of
the all()
method should be self explanatory, it
retrieves all the records of a given model. The
filter()
method is used to restrict query results on a
given model property, for example filter(state='CA')
is a query to get all model records with state='CA'
.
And the exclude()
method is used to execute a query
that excludes records on a given model property, for example
exclude(state='AZ')
is a query to get all model
records except those with state='AZ'
.
It's also possible to chain
filter()
and exclude()
methods to create
more complex multiple record queries. For example,
filter(state='CA').exclude(city='San Diego')
is a
query to get all model records with state='CA'
and
exclude those with city='San Diego'
. Listing 8-15
illustrates more multiple record query examples.
Listing 8-15. Read multiple records with with all(), filter() and exclude() methods
# Import Django model class from coffeehouse.stores.models import Store # Query with all() method or equivalent SQL: 'SELECT * FROM ...' all_stores = Store.objects.all() # Query with include() method or equivalent SQL: 'SELECT....WHERE city = "San Diego"' san_diego_stores = Store.objects.filter(city='San Diego') # Query with exclude() method or equivalent SQL: 'SELECT....WHERE NOT (city = "San Diego")' non_san_diego_stores = Store.objects.exclude(city='San Diego') # Query with include() and exclude() methods or equivalent SQL: # 'SELECT....WHERE STATE='CA' AND NOT (city = "San Diego")' ca_stores_without_san_diego = Store.objects.filter(state='CA').exclude(city='San Diego')
Sometimes it can be helpful or even necessary to view the actual SQL executed by a Django model query. You can do so by appending .query to a query, as illustrated in the following listing:
from coffeehouse.stores.models import Storeimport loggingstdlogger = logging.getLogger(__name__) # Get the Store records with city San Diego san_diego_stores = Store.objects.filter(city='San Diego') stdlogger.debug("Query %s" % str(san_diego_stores.query)) # You can also use print(san_diego_stores.query)
As you can see in the previous snippet, you can output the SQL query to a Python logger or use the 'quick & dirty' print statement. Note that .query only works with queries that output QuerySets, so it doesn't work with queries like with the get() method -- more on QuerySets shortly. Chapter 5 describes other alternatives to inspect the SQL used by model queries (e.g. Django debug toolbar) and Chapter 3 shows how to output SQL queries in Django templates (e.g. Debug context processor
sql_queries
variable).
Tip In addition to single fields -- city="San Diego" or state="CA" -- the all(), filter() and exclude() methods can also accept multiple fields to produce an and query (e.g. filter(city="San Diego", state="CA") to get records were both city and state match). See the later section in the chapter on queries classified by SQL keyword.
Besides the all()
,
filter()
and exclude()
methods, Django
models also support the in_bulk()
method. The
in_bulk()
method is designed to efficiently read many
records, just like the bulk_create()
method --
described in the past section -- is used to efficiently create many
records.
The in_bulk()
method
is more efficient to read many records vs. the all()
,
filter()
and exclude()
methods, because
all the latter methods produce a QuerySet and the former produces a
standard Python dictionary. Listing 8-16 illustrates the use of the
in_bulk()
method.
Listing 8-16. Read multiple records with with in_bulk() method
# Import Django model class from coffeehouse.stores.models import Store # Query with in_bulk() all Store.objects.in_bulk() # Outputs: {1: <Store: Corporate (San Diego,CA)>, 2: <Store: Downtown # (San Diego,CA)>, 3: <Store: Uptown (San Diego,CA)>, 4: <Store: Midtown (San Diego,CA)>} # Compare in_bulk query to all() that produces QuerySet Store.objects.all() # Outputs: <QuerySet [<Store: Corporate (San Diego,CA)>, <Store: Downtown # (San Diego,CA)>, <Store: Uptown (San Diego,CA)>, <Store: Midtown (San Diego,CA)>]> # Query to get single Store by id Store.objects.in_bulk([1]) # Outputs: {1: <Store: Corporate (San Diego,CA)>} # Query to get multiple Stores by id Store.objects.in_bulk([2,3]) # Outputs: {2: <Store: Downtown (San Diego,CA)>, 3: <Store: Uptown (San Diego,CA)>}
The first example in listing 8-16
uses the in_bulk()
method without any arguments to
produce a dictionary with the records of the Store model (i.e. just
like the all()
method). However, notice how the output
of the in_bulk()
method is a standard Python
dictionary, where each key corresponds to an id
value of
the record.
The remaining examples in listing
8-16 illustrate how the in_bulk()
method can accept a
list of values to specify which record id's should be read from the
database. Here again, notice that although the behavior is similar
to the filter()
or exclude()
methods, the
output is a standard Python dictionary vs. a QuerySet
data structure.
Now that you have a clear
understanding of the various methods that can read multiple model
records and how some methods produce a QuerySet
and
other don't, it begets the question, what is a
QuerySet
and why is it used in the first place ? So
before we move on to the next parts of this broader section -- on
how to do CRUD operations on multiple records -- we'll take a brief
detour to explore the QuerySet
data type.
Understanding a QuerySet: Lazy evaluation & caching
The first important
characteristic of a QuerySet
data type is technically
known as lazy evaluation. This means a QuerySet
isn't
executed against the database right away, it just waits until its
evaluated. In other words, the act of running a snippet like
Store.objects.all()
doesn't involve any database
activity right away. Listing 8-17 illustrates how you can even
chain query after query and still not trigger database
activity.
Listing 8-17. Chained model methods to illustrate concept of QuerySet lazy evaluation.
# Import Django model class from coffeehouse.stores.models import Store # Query with all() method stores = Store.objects.all() # Chain filter() method on query stores = stores.filter(state='CA') # Chain exclude() method on query stores = stores.exclude(city='San Diego')
Notice the three different
statements in listing 8-17 that chain the all()
,
filter()
and exclude()
methods. Although
it can appear listing 8-17 makes three database calls to get
Store
records with state='CA'
and
excludes those with city='San Diego'
, there is no
database activity!
This is how QuerySet
data structures are designed to work. So when does a query made on
a QuerySet
data type hit the database ? There are many
triggers that make a QuerySet
evaluate and invoke an
actual database call. Table 8-1 illustrates the various
triggers.
Table 8-1. Django QuerySet evaluation triggers that invoke an actual database call
Evaluation Trigger | Description | Example |
---|---|---|
Iteration | Creating a loop on a QuerySet triggers a database call. | for store in Store.objects.all(): |
Slicing with 'step' argument | Slicing a QuerySet with a third argument (a.k.a. 'step' or 'stride' argument) triggers a database call. NOTE: Slicing a Queryset with 1 or 2 arguments just creates another QuerySet | # A list of every 5th record, for the first 100 records Store.objects.all()[:100:5] # Records 50 to 100 # This does NOT trigger a database hit, (2 arguments) Store.objects.all()[49:99] # All records starting from the 6th # This does NOT trigger a database hit (1 argument) Store.objects.all()[5:] # First record # This does NOT trigger a database hit (1 argument) Store.objects.all()[0] |
Pickling* | Pickling a QuerySet forces all the results to be loaded into memory prior to pickling. | import pickle stores = Store.objects.all() pickled_stores = pickle.dumps(stores) |
repr() method | Calling repr() on a QuerySet triggers a database call. NOTE: This is for convenience in the Python interactive interpreter, so you can immediately see query results. | repr(Store.objects.all()) |
len() method | Calling len() on a QuerySet triggers a database call. NOTE: If you only want the number of records, it's more efficient to use the Django model count() method. | total_stores = len(Store.objects.all()) #NOTE: The count() method is more efficient to get a total count efficient_total_stores = Store.objects.count() |
list() method | Calling list() on a QuerySet triggers a database call. | store_list = list(Store.objects.all()) |
Boolean tests (bool(), or, and or if statements) | Making a boolean test on a QuerySet triggers a database call. NOTE: If you only want to check if a record exists, it's more efficient to use the Django model exists() method. | # Check if there's a store with city='San Diego' if Store.objects.filter(city='San Diego'): # There is a store in 'San Diego' pass #NOTE: The exists() method is more efficient for a boolean check san_diego_stores = Store.objects.exists(city='San Diego') |
* Pickling is Python's standard mechanism for object serialization, a process that converts a Python object into a character stream. The character stream contains all the information necessary to reconstruct the object at a later time. Pickling in the context of Django queries is typically used for heavyweight queries in an attempt to save resources (e.g. make a heavyweight query, pickle it and on subsequent occasions consult the pickled query). You can consider pickling Django queries a rudimentary form of caching.
Now that you know the triggers
that cause a QuerySet
to make a call to a database,
let's take a look at other important QuerySet
subject:
caching.
Every QuerySet
contains a cache to minimize database access. The first time a
QuerySet
is evaluated and a database query takes place
-- see evaluation triggers in table 8-1 -- Django saves the results
in the QuerySet
's cache for later use.
A QuerySet
's cache
is most useful when an application has a recurring need to use the
same data, as it leads to less hits on a database. However,
leveraging a QuerySet
's cache comes with a few
subtleties tied to the evaluation of a QuerySet
. A
rule of thumb is to first evaluate a QuerySet
you plan
to use more than once and proceed to use its data to leverage the
QuerySet
cache. This is best explained with the
examples presented in listing 8-18.
Listing 8-18 - QuerySet caching behavior.
# Import Django model class from coffeehouse.stores.models import Store # CACHE USING SEQUENCE # Query awaiting evaluation lazy_stores = Store.objects.all() # Iteration triggers evaluation and hits database store_emails = [store.email for store in lazy_stores] # Uses QuerySet cache from lazy_stores, since lazy_stores is evaluated in previous line store_names = [store.name for store in lazy_stores] # NON-CACHE SEQUENCE # Iteration triggers evaluation and hits database heavy_store_emails = [store.email for store in Store.objects.all()] # Iteration triggers evaluation and hits database again, because it uses another QuerySet ref heavy_store_names = [store.name for store in Store.objects.all()] # CACHE USING SEQUENCE # Query wrapped as list() for immediate evaluation stores = list(Store.objects.all()) # Uses QuerySet cache from stores first_store = stores[0] # Uses QuerySet cache from stores second_store = stores[1] # Uses QuerySet cache from stores, set() is just used to eliminate duplicates store_states = set([store.state for store in stores]) # Uses QuerySet cache from stores, set() is just used to eliminate duplicates store_cities = set([store.city for store in stores]) # NON-CACHE SEQUENCE # Query awaiting evaluation all_stores = Store.objects.all() # list() triggers evaluation and hits database store_one = list(all_stores[0:1]) # list() triggers evaluation and hits database again, because partially evaluating # a QuerySet does not populate the cache store_one_again = list(all_stores[0:1]) # CACHE USING SEQUENCE # Query awaiting evaluation coffee_stores = Store.objects.all() # Iteration triggers evaluation and hits database [store for store in coffee_stores] # Uses QuerySet cache from coffee_stores, because it's evaluated fully in previous line store_1 = coffee_stores[0] # Uses QuerySet cache from coffee_stores, because it's already evaluated in full store_1_again = coffee_stores[0]
As you can see in the examples in
listing 8-18, sequences that leverage a QuerySet
's
cache, trigger the evaluation of the QuerySet
right
away and then use a reference to the evaluated
QuerySet
to access the cached data. Sequences that
don't use a QuerySet
cache, either constantly create
identical QuerySet
statements or make the evaluation
process late and for each data assignment.
The only edge case for caching
QuerySet
's that doesn't fit the previous behavior is
the second to last example in listing 8-18. If you trigger a
partial evaluation of QuerySet
by slicing it (e.g.
[0]
or [1:5]
) the cache is not populated.
So to ensure a QuerySet
cache is used, you must
evaluate a QuerySet
and then slice the results, as
illustrated in the last example in listing 8-18.
Read performance methods: defer() ,only(), values(), values_list(), iterator(), exists() and none()
Although QuerySet
data structures represent a step forward toward dealing with
multiple data records by integrating lazy evaluation and caching
mechanisms, they don't cover the entire performance spectrum needed
to deal with large data queries.
A common performance problem you'll face with large data queries is related to reading unnecessary record fields. Although selectively choosing which fields to read from a database record can be an afterthought in most circumstances, it can have an important impact for queries made on Django models with more than a couple of fields.
The first methods available to
increase performance while reading model records are the
defer()
and only()
methods, both of which
are intended to delimit which fields to read in a query. The
defer()
and only()
methods accept a list
of fields to defer or load, respectively, and are complementary to
one another depending on what you want to achieve. For example, if
you want to defer loading the majority of model fields, it's
simpler to specify which fields to load with only()
,
if you want to defer loading one or a few fields in a model you can
specify the fields in the defer()
method. Listing 8-19
illustrates the use of the defer()
and
only()
methods.
Listing 8-19 -- Read performance with defer() and only() to selectively read record fields.
from coffeehouse.stores.models import Store from coffeehouse.items.models import Item # Item names on the breakfast menu breakfast_items = Item.objects.filter(menu__name='Breakfast').only('name') # All Store records with no email all_stores = Store.objects.defer('email').all() # Confirm loaded fields on overall query breakfast_items.query.get_loaded_field_names() # Outputs: {<class 'coffeehouse.items.models.Item'>: {'id', 'name'}} all_stores.query.get_loaded_field_names() # Outputs: {<class 'coffeehouse.stores.models.Store'>: {'id', 'address', 'state', 'city', 'name'}} # Confirm deferred fields on individual model records breakfast_items[0].get_deferred_fields() # Outputs: {'calories', 'stock', 'price', 'menu_id', 'size', 'description'} all_stores[1].get_deferred_fields() # Outputs: {'email'} # Access deferred fields, note each call on a deferred field implies a database hit breakfast_items[0].price breakfast_items[0].size all_stores[1].email
As you can see in listing 8-19,
both the defer()
and only()
methods can
be chained to a model manager (i.e. objects
) either at
the start or end of a query, as well as be used in conjunction with
other methods like all()
and filter()
. In
addition, notice how both methods can accept a list of fields to
defer or load.
To verify which model fields have
been deferred or loaded, listing 8-19 illustrates two alternatives.
The first technique consists of calling the
get_loaded_field_names()
on the query reference of a
query statement to get a list of loaded fields. The second
technique consists of calling the
get_deferred_fields()
method on a model instance to
obtain a list of deferred fields.
So how do you obtain deferred
fields ? Easy, you cast call them. Toward the end of listing 8-18,
notice how even though the breakfast_items
represents
a query that only loads the name
field, a call is made
to the get the value of the price
and
size
fields. Similarly, the all_stores
reference in listing 8-19 represents a query that defers the
email
field, nevertheless you can get a record's
email
field value by just calling it. Although this
last technique requires an additional database hit to get the
deferred field(s), it also illustrates how easy it's to get a
record's entire fields even if they're deferred.
The values()
and
values_list()
methods offer another alternative to
delimit the fields fetched by a query. Unlike the
defer()
and only()
methods which produce
a QuerySet
of model instances, the
values()
and values_list()
methods
produce QuerySet
instances composed of plain
dictionaries, tuples or lists. This has the performance advantage
of not creating full-fledged model instances, albeit this also has
the disadvantage of not having access to full-fledged model
instances.
The the values()
and
values_list()
methods accept a list of fields to load
as part of a query, a process that's illustrated in listing
8-20.
Tip You can use the values() and values_list() methods without any field argument to produce full model records as plain dictionaries, tuples or lists.
Listing 8-20 -- Read performance with values() and values_list() to selectively read record fields.
from coffeehouse.stores.models import Store from coffeehouse.items.models import Item # Item names on the breakfast menu breakfast_items = Item.objects.filter(menu__name='Breakfast').values('name') print(breakfast_items) # Outputs: <QuerySet [{'name': 'Whole-Grain Oatmeal'}, {'name': 'Bacon, Egg & Cheese Biscuit'}]> # All Store records with no email all_stores = Store.objects.values_list('email','name','city').all() print(all_stores) # Outputs: <QuerySet [('corporate@coffeehouse.com', 'Corporate', 'San Diego'), # ('downtown@coffeehouse.com', 'Downtown', 'San Diego'), ('uptown@coffeehouse.com', 'Uptown', 'San Diego'), # ('midtown@coffeehouse.com', 'Midtown', 'San Diego')]> all_stores_flat = Store.objects.values_list('email',flat=True).all() print(all_stores_flat) # Outputs: <QuerySet ['corporate@coffeehouse.com', 'downtown@coffeehouse.com', # 'midtown@coffeehouse.com', 'uptown@coffeehouse.com']> # It isn't possible to access undeclared model fields with values() and values_list() breakfast_items[0].price #ERROR # Outputs AttributeError: 'dict' object has no attribute 'price'
The first variation in listing
8-20 generates an Item QuerySet
with the
name
field, which as you can see produces a list of
dictionaries with only the name
field and value. Next,
a query is made to get the email
, name
and city
fields for all Store
models
using the values_list()
method. Notice that unlike the
values()
method, the values_list()
method
produces a more compact structure in the form of a tuple. In
listing 8-20 you can also see the values_list()
method
accepts the optional flat=True
argument to flatten the
resulting tuple into a plain list.
Finally, toward the end of
listing 8-20 you can see that when using the values()
and values_list()
methods it isn't possible to obtain
undeclared fields by just calling them, like it's possible with the
defer()
and only()
methods. This behavior
is due to the watered-down QuerySet
produced by the
values()
and values_list()
methods which
aren't full-fledged model objects.
The iterator()
method is yet another option available in Django models that
creates an iterator over the results of a QuerySet
.
The iterator()
method is ideal for large queries that
are intended to be used once, as this lowers the required memory to
store data which is an inherent property of all Python iterators.
Listing 8-21 illustrates a query that uses the
iterator()
method and appendix A describes the core
concepts behind Python iterators.
Listing 8-21 -- Read performance with iterator(), exists() and none().
from coffeehouse.stores.models import Store # All Store with iterator() stores_on_iterator = Store.objects.all().iterator() print(stores_on_iterator) # Outputs: <generator object __iter__ at 0x7f2864db8fc0> # Advance through iterator with __next__() stores_on_iterator.__next__() # Outputs: <Store: Corporate (San Diego,CA)> stores_on_iterator.__next__() # Outputs: <Store: Downtown (San Diego,CA)> # Check if Store object with id=5 exists Store.objects.filter(id=5).exists() # Outputs: False # Create empty QuerySet on Store model Store.objects.none() # Outputs: <QuerySet []>
Another Django model read
performance technique is the exists()
method, which is
illustrated in listing 8-21 and is used to verify if a query
returns data. Although the exists()
method executes a
query against the database, the query used by exists()
is a simplified version compared to a standard query, in addition
to the exists()
method returning a boolean True or
False value compared to a full-fledged QuerySet. This makes the
exists()
method a good option for queries that operate
on conditionals, where it's only necessary to verify if model
records exists and the actual records data is unnecessary.
Finally, the Django model
none()
method -- illustrated at the end of listing
8-21 -- is used to generate an empty QuerySet
,
specifically of a sub-class named EmptyQuerySet
. The
none()
method is helpful for cases where you knowingly
need to assign an empty model QuerySet
, such as edge
cases related to Django model forms or Django templates, that
expect a QuerySet
instance in one way or another. In
such cases, it becomes necessary to create a dummy
QuerySet
, instead of inefficiently creating
QuerySet
that returns data and deleting its
contents.
As you've learned in this
sub-section, in addition to the QuerySet
data
structure, Django also offers many methods specifically designed to
efficiently read large or small amount of records associated with
Django models.
Tip Remember the in_bulk() method from the past section also provides read performance over the basic all(), filter() and exclude() methods.
Update multiple records with update() or select_for_update().
In the section on single record
CRUD operations, you explored how to update single records with the
update()
method, this same method can handle updating
multiple records. This process is illustrated in listing 8-22.
Listing 8-22. Update multiple records with the update() method
from coffeehouse.stores.models import Store Store.objects.all().update(email="contact@coffeehouse.com") from coffeehouse.items.models import Item from django.db.models import F Item.objects.all().update(stock=F('stock') +100)
The first example in listing 8-22
uses the update()
method to update all
Store
records and set their email value to
contact@coffeehouse.com
. The second example uses a
Django F
expression and the update()
method to update all Drink
records and set their
stock
value to the current stock value plus 100.
Django F
expressions allow you to reference model
fields within a query, which is necessary in this case to perform
the update in a single operation.
Although the
update()
method guarantees everything is done in a
single operation to avoid race conditions, on certain occasions the
update()
method may not be enough to do complex
updates. Offering another alternative to update multiple records is
the select_for_update()
method which locks rows on the
given query until the update is marked as done. Listing 8-23
illustrates an example of the select_for_update()
method.
Under the hood, the Django select_for_update() method is based on SQL's SELECT...FOR UPDATE syntax which is not supported by all databases. Postgres, Oracle and MySQL databases support this functionality, but SQLite does not.
In addition, there's the special argument nowait (e.g. select_for_update(nowait=True) to make a query non-blocking) . By default, if another transaction acquires a lock on one of the selected rows, the select_for_update() query blocks until the lock is released. If you use nowait, this allows a query to run right away and in case a conflicting lock is already acquired by another transaction the DatabaseError is raised when the QuerySet is evaluated. Be aware though, MySQL does not support the nowait argument and if used with MySQL, Django throws a DatabaseError.
Listing 8-23 Update multiple records with a Django model with the select_for_update() method
# Import Django model class from coffeehouse.stores.models import Store from django.db import transaction # Trigger atomic transaction so loop is executed in a single transaction with transaction.atomic(): store_list = Store.objects.select_for_update().filter(state='CA') # Loop over each store to update and invoke save() on each entry for store in store_list: # Add complex update logic here for each store # save() method called on each member to update store.save() # Method decorated with @transaction.atomic to ensure logic is executed in single transaction @transaction.atomic def bulk_store_updae(store_list): store_list = Store.objects.select_for_update().exclude(state='CA') # Loop over each store and invoke save() on each entry for store in store_list: # Add complex update logic here for each store # save() method called on each member to update store.save() # Call bulk_store_update to update store records bulk_store_update(store_list_to_update)
Listing 8-23 shows two variations
for select_for_update()
, one using an explicit
transaction and the other decorating a method to scope it inside a
transaction. Both variations use the same logic, they first create
a query with select_for_update()
, then loop over the
results to update each record and use save()
to update
individual records. In this manner the rows touched by the query
remain locked to other changes until the transaction finishes.
Be aware that when using the
select_for_update()
it's absolutely necessary to use
transactions using any of the techniques described in listing 8-23.
If you run the select_for_update()
method in a
database that supports it and you don't use transactions as
illustrated in listing 8-23 -- maintaining Django's default
auto-commit mode -- Django throws a
TransactionManagementError
error because the rows
cannot be locked as a group. Using the
select_for_update
method in a database that offers no
support for it has no effect (i.e. you won't see an error).
Delete multiple records with delete()
To delete multiple records you
use the delete()
method and append it to a query.
Listing 8-24 illustrates this process.
Listing 8-24 Delete model records with the delete() method
from coffeehouse.stores.models import Store Store.objects.filter(city='San Diego').delete()
The example in listing 8-24 uses
the delete()
method to delete the Store
records with city='San Diego'
.