Read DynamoDB table items into a Pandas Dataframe¶

For all the Python developers working on AWS: have you ever wanted to easily read a DynamoDB table directly into a pandas DataFrame?

Check out the latest release of AWS SDK for pandas (formerly AWS Data Wrangler), where I contributed with a brand new read method for DynamoDB module!

Features¶

automatically switch between available DynamoDB read actions, choosing the optimal one (aka "no more headaches fighting with boto3") as defined in this hierarchy get_item > batch_get_item > query > scan (inspiration from here and here)
support filtering both on keys and attributes
automatically sanitize DynamoDB reserved keywords
prevent unwanted full table scan
allow attributes selection via columns kwarg
support limiting the number of returned items with the max_items_evaluated kwarg (a sort of an head() method for the table!)
...and more!

Read here the full API reference.

History¶

I found myself putting some effort in trying to handle reading items from a DynamoDB table and returning a Pandas Dataframe. Basically, I wanted to abstract some complexity away from available Boto3 read actions, and handle once for all the headache of thinking about keys, query, scan, etc.: since I was pretty happy with the result, I decided to submit a PR with a candidate for wr.dynamodb.read_items in aws/aws-sdk-pandas#1867.

I was aware of the addition of wr.dynamodb.read_partiql_query in aws/aws-sdk-pandas#1390, as well as the related issues as reported in aws/aws-sdk-pandas#1571, but the proposed solution does not involve PartiQL: my goal was to avoid as much as possible the risks that come with its usage towards a DynamoDB table, regarding possible translation of a given query to a full scan op (see for example the disclaimer in the docs).