Overview
This is the second part of my blog that documents my experience with Dynamo Db API and is meant to be read after the first part that can be found here:
This part will dwelve into the Dynamo Db API and how it is to be used
What is the Dynamo Db API
In a relational Db like PostgreSQL when we want to retrieve a set of rows (Result set), we do so by first establishing a TCP connection and then sending a query in structure form:
Postgres Db
SELECT * FROM users WHERE users.name = 'chanaka';
However, Dynamo Db operations are done over HTTP using the AWS SDK which exposes API methods for performing operations.
Setting up
For testing the API methods I created a new instance of a Dynamo Db from the AWS Console. When creating this table I chose the default settings and I chose 'partitionKey' as the name for the partition key and 'sortKey' as the name for the sort key. I would recommend to do the same if you wish to follow along with a hands on implementation of the logic below.
The Free tier of AWS is pretty generous if you want to test it out before rolling it out for production and will do for most POC builds:
API Actions
Falls under 3 categories:
- Item based actions: For specific items
- Queries: For collections
- Scans: For the whole table
Item based actions
- Similar to the CRUD operations exposed by a REST API and consists of 4 main methods
- GetItem: Similar to a GET request in a REST API or SELECT operation in SQL where you retrieve one single item
- PutItem: Similar to a POST request in a REST API or a CREATE operation in SQL where you write one item (Can replace existing for same partition key)
- UpdateItem: Alter properties of an existing item. If item does not exist it would create a new item (Similar to PATCH in REST and UPDATE in SQL)
- DeleteItem: Used to delete an item (Similar to DELETE in REST and SQL)
- Each request must contain the full key of the particular item (partition key and sort key)
- All actions to alter data (Writes, Updates, Deletes) must be an item based action
- Therefore, you cannot alter the data of multiple items at once. (Ex: Update attribute X of all items with partition key of K where each item has a different sort key S)
- All item based actions can only be performed on the core table and not the secondary index
- There are also batch actions and transaction actions
- They operate on multiple items at the same time but each item needs to be specified explicitly
- In batch actions a failure of one request will not affect the other requests (Some can still pass)
- In transactional actions all of the requests either pass or fail
Query
- Retrieve multiple items using the partition key
- Since multiple items with the same partition key are in the same collection, read operations are fast and efficient
- We can also filter the data set further by using the sort key for the items within the collection
- Query API can be used on both the main table as well as the secondary index
Scan
- Used for operations on the entire table
- If table is large it will paginate vs a single request
- First request will send some data and a pagination key. This key needs to be sent back to get a new set of data.
Why the heck are there so many limitations?
- PostgreSQL rows can go up to 1.6 Tb and a result set can be much larger than what is possible in Dynamo Db
- Why can't Dynamo Db do the same? It's because the architecture behind Dynamo Db enforces writing good queries and data modelling
- This helps to ensure that performance does not degrade as your application scales
- Since data is retrieved from hashing using the partition key it can do a O(1) lookup of the data wherever it is
- Since the data is inherently small it can be returned to the client with low latency
- Allows filtering methods like startsWith & <
- Does not allow filtering methods like endsWith() and contains()
- This is because the items in a collection are stored in a B-tree using the sort key
- Since the time complexity of B-tree search is O(log n) it is a performant operation as well
- Since a query request can only be 1 Mb at most individual requests are fast even if you have a large subset