Now what kind of a question is that?  Of course industries such as social networks like Facebook will always have a demand but that’s not what I’m talking about here, but rather the need for a business to hire data scientists.  Right now the move is hot and everyone seems to think they need one but how long will this continue?  Data scientists in genomics and science will always have a place because that’s what scienceimage is, so once we rule out industries that are research and do other services and create other types of products, how long will those types of companies ride the data scientist trend?  When you read the link below keep in mind it was written two years ago before the data scientist explosion. 

A couple years ago I agreed with a banker with the fact that half of the analytics will not be worth the investment.  Data scientists are analysts and a bit more with programming ability and so on.  Companies will reach a saturation point with weighing out what they are getting for their investments.  We are already seeing some of this now with folks getting sold on the fact that what is really non relevant data will help them better understand their customers, spending, budgets, etc. 

Half of Analytics Investments By Companies and Banks Will Be a waste–What Do We Analyze with Big Data and Does It Have Value–Some Algo Fairies Would Do Better at Disneyland…

Already before “big data” got big we were already seeing it and we are seeing more of it today.  In addition, machine learning is taking over where companies can use online services to query and find needed results, and there’s service such as Google Big Query that can be used as well.  Here’s an example below where anyone can create some predictive analytics.  We have a sense of already commoditizing data scientists. 

Predictive Analytics, BigData Software As A Service, Fun With Dick and Jane As Anyone Can Model Here With Machine Learning, Might Start Giving Mobile App Creation Some Competition..

Software and analytics are sometimes the easiest thing in the world to get duped with as it will perform and do what the vendor or consultant is telling you, but then comes the catch, where you have to ask yourself “is this relative data that I need and will there be a return on investment”.  That’s a big question as well as sometimes folks just don’t know.  In addition there’s schools that will crank out a data scientist for a company in a few weeks too.

Data Scientist Jobs About To Be Commoditized–Online School Says They Can crank One Out For Your Company In Just 12 Weeks To Work With Your “Big Illusive Data”…

So then at this point you have all your analytics as promised but what does it really do for a company?  Is the data relevant and does it have value?  You have to figure out what to do with it next after a possible costly adventure in getting it.  Sometimes the cost of doing so almost puts it in a place to where folks feel they “have” to use it somehow so there’s what’s called a “data sniff” and you need folks that understand your business as well as the talent of a data scientist to find the value, and sometimes there’s not any that benefits the course of running a business. 

Data selling has also helped explode the market as banks, companies now who have in house data want to make money with it.  It used to be information that was “client confidential” but no more when it’s burning a hole in their pockets.  This is driving a ton of data scientist work today.  MasterCard is one big example who sells a ton of consumer credit card data and they market it and even somewhat bragged about the fact that they were going to do this to make money.

Mastercard As Well As Other Financial Institutions Using Big Data To Get Into Your “Online Pants” As Many Consumers Seem To Be Accidentally And Inadvertently Leaving Their “Internet Fly” Open

It takes time to prepare data into formats data scientists can work with as there’s multiples out there and data is not all created equal.  There’s even companies springing up that will do that type of work for data scientists on a somewhat automated basis too. 

Again we are going to end up with some huge analytics out there and once more I’m not talking about genomics and science at all here but rather business related analytics that have to do with running a business.  So the question is how long will this phase continue and when will businesses become more savvy on the kind of analytics “they really need”.  It’s going to run into a lot of money as it becomes more complex and if companies are not seeing a “real” return on investment, then some of this is not going to have value.  Sometimes folks get all excited over the analytics and what it shows, but in fact the business value may not be there for time and money spent.  Data scientists have to model and all models don’t work in the real world, but look good as a “proof of concept”.  Here’s an essay written by a quant who’s modeled all her life and this is what she has to say.  She’s telling you that data science tools have decent size failure rate as variables with the world simply cause some of them. 

“On Being a Data Skeptic- Modelers Have A Bigger Responsibility Now Than Ever Before”–A Must Read Essay, Start “Sniffing the Data”…

"Data is here, it's growing, and it's powerful." Author Cathy O'Neil argues that the right approach to data is skeptical, not cynical––it understands that, while powerful, data science tools often fail. Data is nuanced, and "a really excellent skeptic puts the term 'science' into 'data science.'" The big data revolution shouldn't be dismissed as hype, but current data science tools and models shouldn't be hailed as the end-all-be-all, either.

So you can see from the quote above that data science tools are not going to be the big “save the world” pitch that we hear all the time.  As a matter of fact I had a short Twitter chat with the CIO of the FCC and he agreed with me with needing more open source models so they can be verified for accuracy as we can’t always just say “well it looks good to me and should work” anymore as proprietary models could be hiding a lot of ill fated type algorithms that generate profit first and other items of information secondary.  One of my favorite quotes on this topic is from mathematician Charlie Siefe and watch video #1 in my footer for more on this topic.  “Well gee that formula in print with the story has a square root in it, well it must be good”.  He tells you flat out that you get duped as folks who work with math and models and want to profit are hucksters that know people fall for it all the time. 

Somebody Needs to Start Calling “Foul” On Proprietary Predictive Algorithms When They Cannot Be Replicated For Accuracy As This Accelerates Inequality and Promotes Even More Data Selling For Profit

So when is the demand for data scientists going to begin to slow?  Will it occur when companies, banks, insurers, etc. find out the data is in essence not relevant to the real core of their business, could be.  Data can be addictive and we are seeing that today and sometimes folks don’t know when to stop and keep collecting data that appears to be relevant but in reality it may have some virtual values, but falls short in the real world, which is what counts.  This is where a lot of the confusion is that we see today with folks not being able to determine what’s a virtual value and what’s a real world value or if that virtual value of a model they created can have a positive impact on the real world.

We have all seen the negative effects of models that make money for banks, think subprime if you will as models didn’t change the portfolios, they just changed the math.  Models can be built on purpose to lie and if you are dealing with proprietary code that can be replicated for accuracy, do you trust it?  A lot of people do and get caught today as after the model runs for a while audits will eventually catch up, such as what happened at CMS at the link below.  Insurers used models that algorithmically adjusted and tweaked the math for profit on Medicare Advantage claims and now CMS is kind of stuck on what to do next as they were had by the math.  This was all modeled either by Quants or data scientists to function the way it did.

CMS Discovers That Insurers Offering Medicare Part D “Really Know To Sharp Shoot A Model With Adjusting Risk For Profit”, A Common Everyday Occurrence in Financial Markets…

So where do those data scientists or Quants go next, or what’s their next model for their employers, the insurance companies?  This what I sad way back in 2009 would happen at HHS/CMS and sadly it did.  We are getting to the point to where some of the data scientist models can’t be trusted either so as more audits in time verify that the models were not working or were hiding risk, the need for data scientists and some of their models will decrease and the ones who write good code and truly help find relative data will be the half that remain.  Do people write “dirty” code out there, you bet they do and I call it “code hosing” as well but when the models can’t be replicated for accuracy, again half of all the analytics will fall by the wayside as well as those who produce it, and thus so the number of data scientists.  There will always be a need for data scientists but how many and what kind of data and needed relevance provided will be the big question.  BD 


Post a Comment