22 Advanced DynamoDB Gotchas You Don’t Want to Learn the Hard Way
Amazon DynamoDB promises near-infinite scalability, serverless operations, and low-latency performance. It’s a solid foundation for many cloud-native applications — from IoT to real-time analytics. But beneath the simplicity lies a jungle of design decisions, silent failures, and performance pitfalls.
Whether you’re new to DynamoDB or an experienced AWS developer, these 22 advanced gotchas will help you avoid painful lessons in production.
1. Hot Partitions (Skewed Access)
DynamoDB automatically partitions your data, but if too many requests hit the same partition key, it results in throttling. A common mistake is using low-cardinality partition keys like region = "US"
or active user_id
s, creating hot spots under load.
Fix: Use high-cardinality partition keys or introduce artificial sharding (e.g., appending a random number to the key).
2. Exceeding the 400KB Item Size Limit
Every DynamoDB item (including all attributes) must be under 400KB. This limit includes overhead, and violating it results in failed writes.
Fix: Store large payloads (images, documents, logs) in S3 and only save metadata or S3 links in DynamoDB.
3. Overusing Scans
Scan operations read every item in a table and apply filters afterward. This is highly inefficient, slow, and expensive in terms of read capacity units (RCUs).
Fix: Use Query operations instead. Scans are best left for offline or admin operations.
4. In-Memory Joins
DynamoDB does not support joins. Developers often perform multiple queries and combine the results in application code. This results in N+1 query problems, high latency, and difficult-to-maintain logic.
Fix: Use single-table design with composite keys or denormalize your data.
5. Misusing Global Secondary Indexes (GSIs)
GSIs offer flexibility but come with their own write capacity and eventual consistency model. If a GSI is underprovisioned or misconfigured, writes to the base table can silently fail or throttle due to GSI lag.
Fix: Keep GSIs minimal, monitor write throttles, and ensure adequate capacity or switch to on-demand mode.
6. Meaningless Use of Local Secondary Indexes (LSIs)
LSIs share the same partition key as the base table but allow alternate sort keys. They are often created by default without a real use case.
Fix: Use LSIs only when your access pattern requires different sort key views for the same partition key.
7. Inefficient Filtering with FilterExpression
Filter expressions apply after the data is read. So even if a filter returns only 2 items, you pay for all 100 that were scanned.
Fix: Always prefer key conditions in queries over filters. If filtering is needed, consider restructuring your keys or using GSIs.
8. Assuming Consistent Reads
By default, DynamoDB returns eventually consistent reads, which may not reflect the latest data. This can cause bugs in systems requiring immediate consistency.
Fix: Use ConsistentRead = true
for critical reads. Be aware this is not supported on GSIs.
9. Lack of Optimistic Locking
DynamoDB doesn’t have built-in versioning. Concurrent updates on the same item can silently overwrite each other.
Fix: Add a version
or updated_at
attribute and use conditional writes to implement optimistic locking.
10. TTL (Time to Live) Is Eventually Consistent
TTL deletes are not immediate. Items marked for expiration may remain in the table for hours before being removed.
Fix: Don’t use TTL for precise expiration or time-critical workflows. Use application-level logic for time-sensitive actions.
11. Misusing Capacity Modes
Choosing provisioned mode for a bursty workload leads to throttling. Using on-demand for consistently high-traffic applications can be costly.
Fix: Use on-demand for unpredictable workloads, and enable auto-scaling if using provisioned mode.
12. Poor Key Design for Access Patterns
Designing your table based on entities instead of access patterns leads to expensive queries, scans, and workarounds.
Fix: Start with your query patterns and design your keys and indexes to match them.
13. Pagination Confusion with Filters
When using FilterExpression
, pagination (via LastEvaluatedKey
) still counts scanned items, not filtered ones. You may have to paginate through many pages to get a handful of results.
Fix: Understand that page size = scanned items, not returned ones. Optimize filters or use better partitioning.
14. Transactions Aren’t Truly ACID
DynamoDB supports transactions, but they are not what you'd expect from a traditional RDBMS:
-
Limited to 100 items or 4MB.
-
No rollback of side effects (e.g., Lambda triggers).
-
GSIs remain eventually consistent.
Fix: Use transactions only when necessary, and make side effects idempotent.
15. Parallel Scan Isn’t Magic
Parallel Scan splits the table into logical segments and scans them in parallel. However:
-
Segment sizes can be uneven.
-
Hot partitions still throttle.
-
SDKs often lack good support for it.
Fix: Only use Parallel Scan for read-heavy, one-off jobs. Prefer partitioned queries for scalable access.
16. Adaptive Capacity Isn’t a Silver Bullet
DynamoDB uses adaptive capacity to shift throughput toward hot partitions. But this doesn't always kick in fast enough, especially with sudden spikes or uneven access patterns.
Fix: Design for even access, monitor partition metrics, and avoid assuming adaptive capacity will save you.
17. Single Partition Scaling Illusion
Even if DynamoDB auto-shards data >10GB or >3K RCU/WCU, a single logical partition key still becomes a bottleneck. All traffic goes through the same routing layer.
Fix: Use composite or hashed partition keys to spread load evenly from the start.
18. Random Access Patterns Are Hard
There's no efficient way to select a random item in DynamoDB. Full scans or random UUIDs are expensive and unreliable.
Fix: Use pre-generated random keys or auxiliary indexes that support range queries.
19. Single-Table Design Becomes Unmanageable
Single-table design offers performance benefits, but:
-
It requires complex sort key conventions.
-
It becomes difficult to debug or evolve.
-
Requires strong discipline in naming and documentation.
Fix: Use single-table design only when access patterns are stable and known. Otherwise, start with multi-table.
20. Index Consistency Divergence
LSIs support strong consistency; GSIs do not. This inconsistency can trip up developers expecting consistent behavior across all indexes.
Fix: Document which queries use which indexes and test consistency assumptions explicitly.
21. Conditional Writes Provide No Context on Failure
Conditional updates return only an error if the condition fails — no visibility into what was there before.
Fix: Use ReturnValuesOnConditionCheckFailure = ALL_OLD
to capture previous state and handle gracefully.
22. Transport Overhead Can Eclipse Dynamo Latency
DynamoDB is fast, but SDK overhead (retries, marshalling, HTTPS, cold starts) can dominate latency, especially in serverless functions or high-concurrency environments.
Fix: Reuse connections, use async SDKs, and measure end-to-end latency, not just Dynamo response time.
Final Thoughts
DynamoDB is an amazing tool — but only if used with a clear understanding of its design constraints. Most horror stories come not from the service itself, but from mismatched expectations or flawed assumptions.
By recognizing these gotchas early, you can design resilient, scalable systems that make DynamoDB shine.
Want more? Consider creating a testing harness for your access patterns, or use AWS CloudWatch Contributor Insights to detect hot keys before they hurt. And always model your data based on how it will be accessed, not how it looks.
Stay sharp, and happy building.
Comments
Post a Comment