Skip to main content

Design in public - Information Vault (Post 6)

 his is the sixth post on information vault taken from John Crickett coding exercises. In this installment encryption for Private Data is added and also the retrieval of said data in plain text for when there is need for it. With this simple code it is almost unreasonable to have plaintext Private Data while using AWS DynamoDb, the downside is that full text search is not possible on these fields so a different approach is needed.


All the code can be found on.


https://github.com/mtzmontiel/dip-data-vault


To continue from previous posts which can be found here:



Now to add actual Private Data Storing with Encryption I’ll be using DynamoDb Encryption Client which uses AWS Key Management System as materials provider for encryption. First we need to define the Customer Managed Key that will be used, this is done on the serverless application model (SAM) Template as follows. Bear in mind that this will incur monthly costs of at least $1 USD/Month so be ready to discard the stack when you are done.


 KmsKey:

   Type: AWS::KMS::Key

   Properties:

     Description: Encryption key form Private Data Vault

     Enabled: true

     EnableKeyRotation: true

     KeySpec: SYMMETRIC_DEFAULT

     MultiRegion: false

     KeyPolicy:

       Version: 2012-10-17

       Id: key-default-1

       Statement:

         - Sid: "Enable Root User Permissions"

           Effect: Allow

           Principal:

             AWS: !Sub "arn:aws:iam::${AWS::AccountId}:root"

           Action: "kms:*"

           Resource: "*"


Then to be able to use this key we need to add a an IAM Role that can perform the operations with it; at this point I decided to move away from SAM Connectors as adding individual policies and connectors means that there are multiple ways to create IAM Resources and it is a good idea to be clear with intentions so the Role definition will have both KMS, DynamoDb and Cloudwatch permissions.


 VaultLambdaRole:

   Type: AWS::IAM::Role

   Properties:

     Description: String

     AssumeRolePolicyDocument:

       Version: "2012-10-17"

       Statement:

         - Effect: Allow

           Principal:

             Service:

               - lambda.amazonaws.com

           Action:

             - 'sts:AssumeRole'

     Policies:

       - PolicyName: KMS

         PolicyDocument:

           Version: "2012-10-17"

           Statement:

             - Effect: Allow

               Action:

                 - 'kms:Encrypt'

                 - 'kms:Decrypt'

                 - 'kms:GenerateDataKey*'

               Resource: !GetAtt KmsKey.Arn

       - PolicyName: DynamoDb

         PolicyDocument:

           Version: "2012-10-17"

           Statement:

             - Effect: Allow

               Action:

                 - 'dynamodb:PutItem'

                 - 'dynamodb:UpdateItem'

                 - 'dynamodb:DeleteItem'

                 - 'dynamodb:DescribeTable'

                 - 'dynamodb:GetItem'

               Resource: !GetAtt VaultTable.Arn

       - PolicyName: Cloudwatch

         PolicyDocument:

           Version: "2012-10-17"

           Statement:

             - Effect: Allow

               Action:

                 - 'logs:CreateLogGroup'

                 - 'logs:CreateLogStream'

                 - 'logs:PutLogEvents'

               Resource: '*'


Then we add dependency and configure the DynamoDb encryption client. On requirements.txt this is appended.


dynamodb-encryption-sdk


And on the owner lambda we change it so it creates an Encrypted Table with default action of None and Sign and Encrypt action for enc_data attribute.


import boto3

import botocore

from dynamodb_encryption_sdk.material_providers import CryptographicMaterialsProvider

from dynamodb_encryption_sdk.encrypted.table import EncryptedTable

from dynamodb_encryption_sdk.structures import AttributeActions

from dynamodb_encryption_sdk.identifiers import CryptoAction

from dynamodb_encryption_sdk.material_providers.aws_kms import AwsKmsCryptographicMaterialsProvider


def create_table(table_name, key_id):

   return create_secure_table(table_name, key_id)


def ddb():

   return boto3.resource('dynamodb')


def create_table_without_encryption(table_name):

   return ddb().Table(table_name)


def create_secure_table(table_name, key_id):

   materials_provider = create_materials_provider(key_id)

   return create_encrypted_table(table_name, materials_provider)


def create_materials_provider(key_id):

   return AwsKmsCryptographicMaterialsProvider(key_id)


def create_encrypted_table(table_name, materials_provider):

   table = boto3.resource('dynamodb').Table(table_name)

   attribute_actions = AttributeActions(

       default_action=CryptoAction.DO_NOTHING,

       attribute_actions={

           'enc_value': CryptoAction.ENCRYPT_AND_SIGN

       }

   )

   return EncryptedTable(table, materials_provider, attribute_actions=attribute_actions)



And the store data function is changed to add the value which will end up encrypted.


def store_data(event, context):

   """Store Private data and respond with corresponding tokens"""

   elements = extract_elements(event)

   results = dict()

   principal_id = extract_principal_id(event)

   for k,v in elements.items():

       token = tokenize(v)

       ddresult = table.put_item(Item={

               "pk": get_pk(principal_id, token),

               "owned_by":principal_id,

               "token": token,

               "data_class": v["classification"],

               "enc_value": v["value"]

           })

       if ddresult["ResponseMetadata"]["HTTPStatusCode"] != 200:

           results[k] = dict()

           results[k]["success"] = False

           continue

       results[k] = dict()

       results[k]["token"] = token

       results[k]["success"] = True

   return results


Once we deploy and run the tests, the Private Data DynamoDB Table has these contents.



With this the remaining part is to return the plain text data which is done with the same Table object via a Get operation assembling the partition key from the owner and the token. This will be left off as the exercise has already been too long but the code is similar to this.


def get_data(event, context):

   """Get private data by tokens"""

   principal_id = extract_principal_id(event)

   results = dict()

   token = extract_token(event)

   results["token"] = token

   try:

       ddresult = table.get_item(Key={"pk": get_pk(principal_id, token)},

                                   ConditionExpression="owned_by = :val",

                                   ExpressionAttributeValues={":val": principal_id})

       if ddresult["ResponseMetadata"]["HTTPStatusCode"] != 200:

           results["success"] = False

           results["message"] = "Could not process request."

       else:

           results["success"] = True

           results["message"] = "Successfully Retrieved"

           results["data"] = ddresult["Item"]["enc_data"]

           results["data_class"] = ddresult["Item"]["data_class"]

   except res.meta.client.exceptions.ConditionalCheckFailedException as e:

       results["success"] = False

       results["message"] = "Could not process request"

       return results, e


   return results, None


Popular Posts

Logffillingitis

I'm not against of leaving a trace log of everything that happens on a project what I'm completely against is filling documents for the sake of filling documents. Some software houses that are on the CMMI trail insist that in order to keep or to re validate their current level they need all their artifacts in order but what is missing from that picture is that sometimes it becomes quite a time waster just filling a 5 page word document or an spreadsheet which is just not adequate for the task needed. Perhaps those artifacts cover required aspects at a high degree but they stop being usable after a while either by being hard to fill on a quick and easy manner by someone with required skills and knowledge or they completely miss the target audience of the artifact. Other possibility is that each artifact needs to be reworked every few days apart to get some kind of report or to get current project status and those tasks are currently done by a human instead of being automated.

Are we truly engineers? or just a bunch of hacks...

I've found some things that I simply refuse to work without. Public, Centralized requirements visible to all parties involved. I is ridiculous that we still don't have such repository of information available,  there is not a sane way to assign an identifier to the requirements. Then we go with the 'it is all on Microsoft Office documents' hell which are not kept up to date and which prompts my next entry. Version control. When we arrived here quite a lot of groups were working on windows shared folders... now it is a combination of tools but heck at least there is now version control. Controlled environments and infrastructure. Boy... did I tell you that we are using APIs and tools that are out of support? Continuous deployment. First time here, to assemble a deliverable artifact took 1-2 human days... when it should have been 20 minutes of machine time. And it took 1 week to install said artifact on a previously working environment. And some other things that

Qualifications on IT projects. Random thoughts

Projects exceed their estimates both in cost and time. Why? Bad estimation would be an initial thought. If you know your estimates will be off by a wide margin is it possible to minimize the range? Common practice dictates to get better estimates which means get the problem broken down to smaller measurable units, estimate each of them, aggregate results and add a magic number to the total estimate. What if instead of trying to get more accurate estimates we focused on getting more predictable work outcomes? What are the common causes of estimation failure: Difficult problem to solve / Too big problem to solve Problems in comunication Late detection of inconsistencies Underqualified staff Unknown. I'd wager that having underqualified staff is perhaps the most underestimated cause of projects going the way of the dodo. If a problem is too complicated why tackle it with 30 interns and just one senior developer? If it is not complicated but big enough why try to dumb it down a

Job interviews

So after my sabatic period I started to go to different job interviews (most of them thanks to my fellow colleages whom I can't thank enough) and after most of them I feel a little weird. Everyone tries to get the best people by every means possible but then somethin is quite not right. Maybe they ask wrong questions, ask for too much and are willing to give to little in return or just plain don't know what they want or what they need. Our field is filled with lots of buzzwords and it is obvious that some people manage to get jobs only by putting them on their résumé. Then there are some places where there is a bigger filter and filters out some of the boasters. But still it is a question of what do they really need and what questions are needed to weed out those that do not cover minimal aspects required by the job. Don't get me wrong, it is really hard to identify good developers on an interview. It seems that almost no one knows what to ask in order to get insights abo