Read DynamoDB table items into a Pandas Dataframe¶
For all the Python developers working on AWS: have you ever wanted to easily read a DynamoDB table directly into a pandas DataFrame?
Check out the latest release of AWS SDK for pandas (formerly AWS Data Wrangler), where I contributed with a brand new read method for DynamoDB module!
Features¶
- automatically switch between available DynamoDB read actions, choosing the optimal one (aka "no more headaches fighting with boto3") as defined in this hierarchy
get_item > batch_get_item > query > scan
(inspiration from here and here) - support filtering both on keys and attributes
- automatically sanitize DynamoDB reserved keywords
- prevent unwanted full table scan
- allow attributes selection via columns kwarg
- support limiting the number of returned items with the
max_items_evaluated
kwarg (a sort of anhead()
method for the table!) - ...and more!
Read here the full API reference.
History¶
I found myself putting some effort in trying to handle reading items from a DynamoDB table and returning a Pandas Dataframe. Basically, I wanted to abstract some complexity away from available Boto3 read actions, and handle once for all the headache of thinking about keys, query, scan, etc.: since I was pretty happy with the result, I decided to submit a PR with a candidate for wr.dynamodb.read_items
in aws/aws-sdk-pandas#1867.
I was aware of the addition of wr.dynamodb.read_partiql_query
in aws/aws-sdk-pandas#1390, as well as the related issues as reported in aws/aws-sdk-pandas#1571, but the proposed solution does not involve PartiQL: my goal was to avoid as much as possible the risks that come with its usage towards a DynamoDB table, regarding possible translation of a given query to a full scan op (see for example the disclaimer in the docs).