Skip to main content

Design In public - Information Vault (Post 2)

 


This is part 2 of a series Design in Public, in the prior post I did a drill down on the requirements that are interesting for me.


  • Store Private Data for the Owner

  • Allow Private Data Owner to delete Private Data from the System thus disabling further use in the future.

  • Allow only internal Principals to retrieve it for a particular flow

  • All interactions from internal Principals where Private Data is wanted but not available anymore should be able know that it was deleted, who was the Owner and when it was Deleted


To store private data there is normally more than one type of information that is stored, for example Addresses, emails, phone numbers, Full Names, Social Security Numbers, Card number (Primary Account Number), Expiry Date. The number of interactions with the system could be modeled as 1 interaction per Data element or to be able to send multiple data elements on a single interaction. Allowing the client to decide how to implement this is better in this scenario so it means that designing for multiple data elements on a single call but only sending one element is possible, therefore I’ll continue with this assumption.


  • Store multiple Private Data elements for the Owner

  • Classification of data as provided by Owner/Caller from a defined list: Full Name, Telephone, Email


At this point I’ll leave out of scope the following


  • Deduplication of Data Elements on a single request. Which means that if the same data element is provided multiple times, each one will be treated independently on the backend.

  • Validation of data types and data classification. Which means that if a value is sent no attempt to match it to expected data format will be made. This would not be acceptable on a production ready system but this exercise is meant to be as simple as possible.


To delete private data the owner must provide the token as it is the only public reference for private data therefore acting as an Identifier, this means that each token must be unique. Now the topic becomes interesting as any token now must have sufficient entropy to avoid collisions and depending on what is the structure of the tokens this might be difficult to implement and considerations for range of output values must be done.


For example Names are the simplest to handle as a string of more than 12 characters with spaces could be used, same as email but with a ‘@’ character in the middle and some ‘.’ or even GUIDs or ULIDs if the downsides of using both of them are acceptable.


Telephone numbers might be tricky as it only expects numeric characters except for the area code at the start of the value and considering that duplicated values will consume another token from the available values. Validate exhaustion of the token pool is important in this case. Perhaps reuse of deleted tokens can be considered later but this has implications on consumer systems.


To exchange a Token for the original Private Data the client must present the Token and the system must validate authorization to exchange said token. This is where delimiting the security boundaries of the system is paramount.

To keep things simple let’s assume that Owner always uses the same public interface while Operators and Other systems have a different interface. This topology allows the use of different user pools or principal sources hence dividing the actions to different interfaces completely. Then there is a need to correctly match each token with a Owner to allow deletion of Private Data. This allows partitioning at Owner Level which also allows to be pragmatic on the Retrieval endpoint and force clients to also provide an Identifier for Owner which might already be in use in other parts of the system. The other benefit of this approach is that the token pool now has expanded as it is partitioned by Owner. Let’s look at the data access patterns.


  • Store Private data for Owner

  • Delete Private Data for Owner by Token

  • Retrieve Private Data by Owner and Token

Popular Posts

Logffillingitis

I'm not against of leaving a trace log of everything that happens on a project what I'm completely against is filling documents for the sake of filling documents. Some software houses that are on the CMMI trail insist that in order to keep or to re validate their current level they need all their artifacts in order but what is missing from that picture is that sometimes it becomes quite a time waster just filling a 5 page word document or an spreadsheet which is just not adequate for the task needed. Perhaps those artifacts cover required aspects at a high degree but they stop being usable after a while either by being hard to fill on a quick and easy manner by someone with required skills and knowledge or they completely miss the target audience of the artifact. Other possibility is that each artifact needs to be reworked every few days apart to get some kind of report or to get current project status and those tasks are currently done by a human instead of being automated.

Are we truly engineers? or just a bunch of hacks...

I've found some things that I simply refuse to work without. Public, Centralized requirements visible to all parties involved. I is ridiculous that we still don't have such repository of information available,  there is not a sane way to assign an identifier to the requirements. Then we go with the 'it is all on Microsoft Office documents' hell which are not kept up to date and which prompts my next entry. Version control. When we arrived here quite a lot of groups were working on windows shared folders... now it is a combination of tools but heck at least there is now version control. Controlled environments and infrastructure. Boy... did I tell you that we are using APIs and tools that are out of support? Continuous deployment. First time here, to assemble a deliverable artifact took 1-2 human days... when it should have been 20 minutes of machine time. And it took 1 week to install said artifact on a previously working environment. And some other things that

Qualifications on IT projects. Random thoughts

Projects exceed their estimates both in cost and time. Why? Bad estimation would be an initial thought. If you know your estimates will be off by a wide margin is it possible to minimize the range? Common practice dictates to get better estimates which means get the problem broken down to smaller measurable units, estimate each of them, aggregate results and add a magic number to the total estimate. What if instead of trying to get more accurate estimates we focused on getting more predictable work outcomes? What are the common causes of estimation failure: Difficult problem to solve / Too big problem to solve Problems in comunication Late detection of inconsistencies Underqualified staff Unknown. I'd wager that having underqualified staff is perhaps the most underestimated cause of projects going the way of the dodo. If a problem is too complicated why tackle it with 30 interns and just one senior developer? If it is not complicated but big enough why try to dumb it down a

Job interviews

So after my sabatic period I started to go to different job interviews (most of them thanks to my fellow colleages whom I can't thank enough) and after most of them I feel a little weird. Everyone tries to get the best people by every means possible but then somethin is quite not right. Maybe they ask wrong questions, ask for too much and are willing to give to little in return or just plain don't know what they want or what they need. Our field is filled with lots of buzzwords and it is obvious that some people manage to get jobs only by putting them on their résumé. Then there are some places where there is a bigger filter and filters out some of the boasters. But still it is a question of what do they really need and what questions are needed to weed out those that do not cover minimal aspects required by the job. Don't get me wrong, it is really hard to identify good developers on an interview. It seems that almost no one knows what to ask in order to get insights abo