In this post we will see how to efficiently delete all rows in an Azure Table Storage using the new(ish) Azure.Data.Tables SDK. We’ll try to optimize for speed while also being mindful about the memory.
Note: When deleting a lot of data from Azure Table Storage usually the fastest way is to just drop the whole table. However, we cannot be sure when exactly we’re able to create a new table with the same name since it can take up to a minute or even longer for Azure to actually get rid of the table.
I created two extension methods for the SDK’s TableClient to make it comfortable reusing the code. First off, we need to query all rows assuming we don’t know all PartitionKey & RowKey combinations in the table already. With each call to the Storage Account we request the maximum of 1000 entities per page. We also only need the PartitionKey and RowKey for each entity, so we’ll filter the query accordingly by passing a list of column names as the select parameter.
Knowing that Azure will return all rows sorted by PartionKey + RowKey we can directly go ahead and delete each page before we request the next one. To do this the ForEachAwaitAsync() extension from the System.Linq.Async NuGet package comes in handy.
The batch processing of our entities is being handled in the second extension method. From a list of entites it creates batches of up to 100 rows while making sure only to group entities with the same PartitionKey to not violate the API requirements.
You can easily copy these methods into your code or make use of the Azure.Data.Tables Extensions package I’ve created which has a few more handy methods just like these.