Deleting all Rows from Azure Table Storage (as fast as possible)

In this post we will see how to efficiently delete all rows in an Azure Table Storage using the new(ish) Azure.Data.Tables SDK. We’ll try to optimize for speed while also being mindful about the memory.

Note: When deleting a lot of data from Azure Table Storage usually the fastest way is to just drop the whole table. However, we cannot be sure when exactly we’re able to create a new table with the same name since it can take up to a minute or even longer for Azure to actually get rid of the table.

I created two extension methods for the SDK’s TableClient to make it comfortable reusing the code. First off, we need to query all rows assuming we don’t know all PartitionKey & RowKey combinations in the table already. With each call to the Storage Account we request the maximum of 1000 entities per page. We also only need the PartitionKey and RowKey for each entity, so we’ll filter the query accordingly by passing a list of column names as the select parameter.

Knowing that Azure will return all rows sorted by PartionKey + RowKey we can directly go ahead and delete each page before we request the next one. To do this the ForEachAwaitAsync() extension from the System.Linq.Async NuGet package comes in handy.

The batch processing of our entities is being handled in the second extension method. From a list of entites it creates batches of up to 100 rows while making sure only to group entities with the same PartitionKey to not violate the API requirements.

/// <summary>
/// Deletes all rows from the table
/// </summary>
/// <param name="tableClient">The authenticated TableClient</param>
/// <returns></returns>
public static async Task DeleteAllEntitiesAsync(this TableClient tableClient)
{
// Only the PartitionKey & RowKey fields are required for deletion
AsyncPageable<TableEntity> entities = tableClient
.QueryAsync<TableEntity>(select: new List<string>() { "PartitionKey", "RowKey" }, maxPerPage: 1000);
await entities.AsPages().ForEachAwaitAsync(async page => {
// Since we don't know how many rows the table has and the results are ordered by PartitonKey+RowKey
// we'll delete each page immediately and not cache the whole table in memory
await BatchManipulateEntities(tableClient, page.Values, TableTransactionActionType.Delete).ConfigureAwait(false);
});
}
/// <summary>
/// Groups entities by PartitionKey into batches of max 100 for valid transactions
/// </summary>
/// <returns>List of Azure Responses for Transactions</returns>
public static async Task<List<Response<IReadOnlyList<Response>>>> BatchManipulateEntities<T>(TableClient tableClient, IEnumerable<T> entities, TableTransactionActionType tableTransactionActionType) where T : class, ITableEntity, new()
{
var groups = entities.GroupBy(x => x.PartitionKey);
var responses = new List<Response<IReadOnlyList<Response>>>();
foreach (var group in groups)
{
List<TableTransactionAction> actions;
var items = group.AsEnumerable();
while (items.Any())
{
var batch = items.Take(100);
items = items.Skip(100);
actions = new List<TableTransactionAction>();
actions.AddRange(batch.Select(e => new TableTransactionAction(tableTransactionActionType, e)));
var response = await tableClient.SubmitTransactionAsync(actions).ConfigureAwait(false);
responses.Add(response);
}
}
return responses;
}

You can easily copy these methods into your code or make use of the Azure.Data.Tables Extensions package I’ve created which has a few more handy methods just like these.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.