Wednesday, July 22, 2020

SQL or NoSQL data for Data Science - Viewpoint - careers advice blog Viewpoint careers advice blog

SQL or NoSQL data for Data Science - Viewpoint - careers advice blog How often have you heard that, in this (new) Big Data world, the newer NoSQL data sources and structures are the key to effective Data Science? And further, that relational data that can be queried with SQL is old-fashioned and traditional and is no longer fit for purpose? Why spend time and investment in building ETL processes that shift data from one database to another and enforce a rigid, non-scalable data model? Why not just dump all of the data in an unstructured or schema-less model? Surely that gives you the most flexibility to really find what you are looking for in the petabytes of data that your organisation collects. The reality is far more complex; as is usually the case in the field of data science. In fact the discussion is moot, as it always has been when talking about which technology is best for solving business problems. I heard nearly identical debates when I started my career over 20 years ago and I have found it odd to see such similar themes re-emerging. The most  critical aspect of data science is not the technology or the data structures; it is doing things that can result in better (or quicker) decisions being made in your business. If you focus on that for just a second, you will realise that most, if not all, business decisions are made about things (or to get technical entities). If data science is going to help me make a better (or quicker) decision about that thing, then I had better make sure my data science output maps back to that thing. To be a little less abstract; if I am going to use data science to help me match a candidate on my database to the job I am being paid by my client to fill, then I better be able to map my data science outputs back to the candidate and job things (entities). I can, of course, do this multiple ways, but the reality is that the candidate and job in my operational system are going to have a unique identifier of some kind and I will need to link my insights back to those unique identifiers. So, whether I choose to Extract the data from my operational system, Transform it and then Load it to a NoSQL repository or a relational database I am still going to need to write and execute ETL processes of some kind. The decision on which technology and data structure I use does, however, still need to be made. My point here is that the decision should be driven by what skills and technology I have within my organisation rather than by what the newest, shiniest technology is. The reality for most organisations is that a hybrid solution is almost always going to provide the greatest returns and biggest impact on the business. In essence; focus more on understanding the business decision you want to influence and less on the technology you are going to use. Share this blog:

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.