This writing describes how data is important in machine learning, to create efficient AI systems. It also describes how data is collected for use in these systems, such as when a user uses their smartphone to make a search. It also provides some information on Google as a company.
I was recently offered a corporate discount on one of the most advanced smartphones on the market, the Google Pixel 2. Watching reviews on the phone I noticed they added a feature where you can squeeze the sides of the phone to open the Google launcher. From the Google launcher (Google Now) it makes it easy to search using voice, text, as well as other options. Many people noticed that it’s easy to “accidentally” launch this when performing innocent actions such as simply picking up the phone. It’s mostly unanimous that users report they use Google Now services much more than before.
This is a great feature which allows users more accessibility and simplicity in using software-based services, mainly Google Now. Whether this was intended as an innocent action or has some psychological research behind it I am not aware, but I am slightly biased that all large and successful corporations use seemingly innocent manipulative psychology to bend their user’s wills.
If it was intended for a purpose other than user accessibility and usability, I cannot say. If it was for purposes of causing the user to use their services more, they have succeeded. People like to have things easy – it’s in our nature to be lazy – and an easy thing will be used more, not to mention the fact a user can “accidentally” open the service. Why would companies want their services used? Machine learning is the answer.
Modern machines are complex systems that consist of structural elements, mechanisms and control components and include interfaces for convenient use. (Wikipedia, accessed March 2018). Learning is the process of acquiring new or modifying existing knowledge, behaviors, skills, values, or preferences. (Wikipedia, accessed March 2018). If we put these definitions together we get complex systems consisting of structural elements that can acquire new information and modify existing stored information or behaviors. Machine learning is, more specifically, algorithms that are adept at “learning” or “evolving” or changing themselves based on experience.
With our current research and applications into machine learning, we know that the most important thing to a machine’s success in learning to perform accurate and successful actions is not based on the algorithm itself, but is most impacted by the amount of data collected and used by the algorithm or system. It is the algorithm with the most user data that is the most successful. So the aim of any machine learning system is to acquire the most data it can. And companies know this, and that is why user data is so sensitive in this age.
I believe it is Google as a company that possesses the most user data, and because of this, Google is the most successful in its search engine and user software. It has the most data, thus has the most efficient Machine learning systems. Google is the best search engine ever created, and was also one of the first, first launched in 1996 (more time to collect useful data). What made it so successful was its early innovative algorithmic choices and simplicity. It’s software is the most highly used. According to StatCounter, Google search currently holds 92% of search use worldwide, and Google Chrome has 57% use worldwide. If anyone has ever tried other search engines they know that those engines don’t hold a candle to Google. And it boils down to two reasons: the chosen searching algorithm, and the amount of data the engine has.
User data can be collected from doing any action while using the service. When you sign up to use a service you always see the “Allow XXX to see and use usage statistics, etc…”, you are basically giving them the right to use your usage data as you use the service. Every time you type in a search, click on a link, open an app, use a voice command, or any action with the service it sends info to the company and machine learning system and it adds that data into its database. When you say things like “this is relevant/not relevant; useful/not useful, you are also supplying the machine with data telling it “Hey, you did good, here’s a reward!” or “Bad machine!”. The user has the power to affect the conditioning and learning of the machine. Most of this info is automatically added into the AI system (and usually anonymous) as there is not enough people to specifically check every piece of data. There is probably one or several supervisors or analysts going over the general performance and changes in the machine rather than manually check each data.
This poses a problem because it creates a positive and negative feedback loops. Since Google has the most data and is the best choice, users will use it more often and give it even more data. Those with less data are less efficient, and are used less because of it. It’s like the retail domain where those who are the largest generally have the lowest prices (Walmart) are more bought from, while those who are smaller (i.e. Mom and Pop shop) will need to have higher prices because they are more honest and have less resources. While I love supporting local small businesses, it’s sometimes a lot easier and more convenient to go to large retail names.
While I make no connotations, nor do I know there is positive or negative ramifications from this, it will depend on how it’s used. If Google is a completely honest and philosophically ethical company, aiding them in their quest for an ultimate AI machine will do good. But if they err on the side of manipulation or power, this can be dangerous. It can be dangerous to allow one company to have the majority of resources, instead of having it spread out, for different companies have different viewpoints, goals, and philosophies. We spoke about this in my previous AI ethics meetings, it was mostly unanimous that data and resources are best shared between many powers instead of one. But these laws are not yet in effect, groups like Montreal AI ethics are working on things like this.
I don’t intend to say Google is doing anything wrong, or if they ever will, they are a great company that has simplified my life and the lives of many other people. The purpose here is to enlighten people to what happens with your data, how the data is used, and why it is important to know. And suggest that a single company shouldn’t have all the resources. The conclusions are left to the user. Also note that most companies are in the business of collecting user data because that is the way the world is progressing, towards the smart AI revolution.