Web logs & web pages offer a wonderful opportunity to share writings, opinions or any comment. I hope that between lines there is something real. Los weblogs y las pag web ofrecen una maravillosa oportunidad publicar cualquier escrito. Espero que entre lineas haya algo real. Les blogs ainsi que les pages Web offrent une merveilleuse occasion de partager l'écriture. J'espère qu'il y a quelque chose de réel entre les lignes.
Friday, April 15, 2016
Big Data Technology - for fiverr - asian magazine
Big Data Technologies Basics
This article is for beginners in Big Data since it doesn’t assume any previous familiarity of the subject. Big data technologies and practices are improving quickly. Some of us have heard the term. For others is still new. Either way now you´ll be asking yourself the question “What is big data”? Here's are the basic to know how to stay ahead of the change!
In order to better understand this terminology it´s best we return to the first times it started to be put in use. The recognized usage of the “big data” concept and definition first was in the NASA´s scientists paper in the year 1997 where it was described the issue of the problem to generate the visualization of computer graphics based on large data sets or as he called it “the problem of big data” in their own words it “provides an interesting challenge for computer systems…the most common solution is to acquire more resources.”
Just 2 years later, another of the major Big Data milestones perhaps is at its beginning when already 15 years ago, the current Chief Economist at Google used to work as Dean of Berkeley´s School of Information, back then they elaborated the first formal research to try determine the total size amount of all the “information” being developed every year. The research back then came up with a result figure of approximately 1.5 billion gigabytes of new created data for 1999. The study was repeated every year, and in only 3 years later, in year 2003 the results of the study had shown that the newly created information was already of 3 billion that year, in the short spam of three years it had already doubled. The conclusion is that data information was getting exponentially larger and larger.
The next milestone is in 2001, when Doug Laney an industry analyst defined the Big Data Industry´s 3Vs, there were: volume, variety, and velocity of big data. Now everyone refers to the V´s of Big Data when attempting to explain it. Now in 2016 we talk about the 5Vs, including also Veracity and Value of the Big Data.
The Oxford English Dictionary or OED (the most customary compound of concept definitions describes big data as: “data of a very large size, typically to the extent that its manipulation and management present significant logistical challenges.” But since 2014 Wikipedia has defined Big data as “any collection of data sets so large and complex that it becomes difficult to process using on-hand data management tools or traditional data processing applications.”
Many noticeable American computer scientists in 2008 expanded to the term “big-data computing” as one the main activities of the companies, researchers, practitioners, and defense or intelligence operations.
The 2011 McKinsey big data study noted clearly definition challenge, and came up with its own “Big Data are datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze”. Though according to renowned Viktor Mayer-Schönberger and Kenneth Cukier on their book they clarify that “there is no rigorous definition of big data”
The newer definition include aspects of the value provided by analysis of big data and their impact “The ability of society to harness information in new ways to produce useful insights or goods and services of significant value” and “…what one can do at a large scale that cannot be done at a smaller one, to extract new insights or create new forms of value” is to what it´s commonly referred to as Big Data. Also Tom Davenport from Big Data At Work explained that the problems with the definition of big data is that it´s ““The broad range of new and massive data types that have appeared over the last decade or so.” Those were enough definitions. Let´s now focus on how it has changed during its short life spam and its underlying technologies.
The technologies of big data and analytics are evolving so quickly that businesses need to be more creative, innovative than ever in order to not risking falling being. Technologies didn’t´ use to improve, change and die ever as quick as they do now, before it took years for them to be mature now sometimes in month are fully functional.
Big Data Technologies are at their highpoint nonetheless they are still developing and in need of more and faster growth to keep up with the industries´ requirement. Big data technology can broadly speaking comprise the different data types sources: Data from Applications, Data from Transactions, Machine Data, Sensors Data, Social Data, Images and Video Data, Geospatial Data, and other. And, Big Data has been divided into 4 four main technological subareas: 1) Big Data Search 2) Big Data Development 3) Big Data Governance and 4) Big Data Analytic Services.
The technologies thus far developed include not only hardware but also applications, software, further advanced algorithms and new ideas. Competition for a head start in Big Data is thriving, trends are being developed and quickly analyzed, and perhaps the exploitation of the conclusions from the analysis of big data is the best advantage an enterprise can have.
The main area of Big Data Technologies being developed are:
1) The Analytics of the Data and Exploitation of Conclusions. The insights from the analysis of such a big data set have often brought the most irreversible conclusion to shape the future of big data and of the industries from which data is being acquired. New algorithms, New inclusion of ideas, and of data analysis and results into the information.
2) Data search and processing speed acceleration. The search engine and data processing engine with developments such as Google, or Hadoop, the speed improvement can come from one of two: either from an enhance of the hardware, data farms, servers, or from an improvement in algorithm, data clustering, moving, identification.
3) Achieve value from all data. Analysis, Conclusion, Data Mining and Research when we deal with such large information and data sets are key and essential since they bring changes of believes, and paradigms.
4) Infrastructure and Systems. Your infrastructure and systems must exploit on real-time information flowing through your organization. It must be optimized for analytics to respond dynamically—with automated business processes, better agility and improved economics—to the increasing demands of big data.
In-memory analytics. Many businesses are already using transaction/analytical processing (HTAP) which allows transactions and analytic processing to reside in the same in-memory database. The usage of in-memory databases to fasten the analytic processing is more popular and highly beneficial. There’s a lot of excitement around HTAP, and businesses have been misusing it, because while you can perform analytics faster with HTAP, all of the transactions must reside within the same database. The problem is that most analytics efforts today are about putting transactions from many different systems together. Just putting it all on one database goes back to this disproven belief that if you want to use HTAP for all of your analytics, it requires all of your transactions to be in one place. You still have to integrate diverse data. Moreover, bringing in an in-memory database means there’s another product to secure, manage, integrate, and scale. For example, take Intuit which uses Spark. This infrastructure can solve many in-memory system needs and also with the use of our analytic in the cloud, in this way Enterprises can save more in internally -memory systems.
5) Control and Governance. The right platform controls how information is generated, shared, filtered, consolidated, maintained, protected, withdrawn and united within your enterprise. The best platform can inspire confidence for your company, brand, business, industries or employees.
6) Structured data management. To achieve efficiencies it must be run certain analytics while the data is still in motion, in this closeness, the management of the systems infrastructure must define when to elect to store or dispose under a state, which eventually would reduce the rate of run storage. There´s also the Unstructured data management.
7) Process exploding volumes. With developments as the Hadoop data operating system platform, and with Distributed analytic frameworks, such as MapReduce, the new enterprises are evolving and in need of distributed resource managers that gradually shift to Hadoop as their general-purpose data operating system. With these systems, it can performed many different data handling, operation, management and analytics processing by working it on Hadoop as the distributed file storage system. The ability to operate many different kinds of data processes and queries against data in Hadoop makes it a low-cost general-purpose system to introduce your information data to analyze and draw the best and most reliable results from. As MapReduce, in-memory, stream processing, graph analytics, SQL, and other types of assignments are able to run with satisfactory performance on Hadoop, more businesses will use Hadoop as an enterprise data hub.
8) Bestter and Faster SQL Analysis on Hadoop. In Hadoop you can save your large data arrays and analyze anything if you as a coder and mathematician know how, this is why, Hadoop is the promise. The benefits of the SQL-like querying on hadoop is the familiarization with the language. SQL for Hadoop products come in as a familiar language tools that support SQL-like querying for letting business users get more easily acquainted with Hadoop platform. Hadoop SQL opens the door to Hadoop in the enterprise since its now easier to use without the need for high big data scientists who write scripts using Java, Python and JavaScript, instead now businesses analysts can use Hadoop as they traditionally did on SQL. These SQL-Hadoop developments are nothing new, in the sense that Apache Hive had already offered a structured SQL-like query language, as Cloudera, Pivotal Software, IBM and other commercial alternatives which are improving their performance. This technology is called iterative analytics, where an analyst asks one question receives an answer, and then asks another, has traditionally required building a data warehouse. SQL on Hadoop isn’t going to replace data warehouses but offers another possibility for certain types of analytics.
9) The called NoSQL, in short for “Not Only SQL” databases are alternatives to traditional SQL-based relational databases which are quickly achieving approval as a tool for a specific kind of analytic application. It´s estimated to be between to 20 to 30 open-source NoSQL databases at least. NoSQL offers products as ArangoDB with graphs from the database, and a faster, and more direct option to analyzing the network of interactions between salespeople or customers. Sensor information stream large data which grows exponentially, a NoSQL key-value database is the appropriate database for many information data sets.
8) Data Storage and Online Storage on the Cloud. Cloud computing is a moving target. To relieve the pressure that big data is placing on your IT infrastructure, you can host big data and analytics solutions on the cloud. Achieve the scalability, flexibility, expandability and economics that will provide competitive advantage into the future. A new bid data derived term is that of Big Data Lakes, these are also called an enterprise data lake or enterprise data hub, whereas traditional database dictated the need of designing the data set before entering data, the new enterprise data lake turns that model, it’ll take these data sources and fill them into a big Hadoop repository, where it´s provided tools for user to analyze the data, along with a high-level definition of what data exists in the lake. These large-scale data base are currently being built and improved as they go along, the only negative side is that user must be highly skilled but the benefits outweigh this since Data Lakes are incremental, customized, and flexible as they are being constantly improved. One of the main concerns with a Hadoop built data lake is that this platform isn’t fully enterprise-ready. The traditional capabilities of encryption, monitoring, data securing, access control, and locating the lineage of data from source that enterprise common databases had are not operational as it would be expected by enterprises.
9) Big data analytics in the cloud. Traditional machines use statistical analysis based on a sample of a total data set, whereas with the Hadoop Processing of the data lake, now we have the capability to analyse very large numbers of records and of attributes per record which increases predictability. Therefore big data analysts have not only more data to work with, but also the processing power to handle large numbers of records with many attributes, to achieve a ore predictive analytics.
Hadoop, which is a framework and set of tools for processing very large data sets, was originally designed to work on clusters of physical machines. That has now changed. “Now an increasing number of technologies are available for processing data in the cloud,” says Brian Hopkins, an analyst at Forrester Research. Examples are Google’s BigQuery data analytics service, IBM’s Bluemix, Amazon’s Redshift, Amazon’s Kinesis data processing service and more cloud platform based services. It´s expected that future of big data will be a mixture of cloud and on-premises storage and processing,
The combination of big data and computing muscle allows analysts discover new behavioral data results, such as websites visited or location. It can be referred to as Sparse data, because to find you interest you must go through immense quantity of other data. Understand that in the recent past trying to use traditional algorithms and machine engines for this type of data was impossible computation-wise. Now, It´s a true game changer as cheap computational powerfulness is being further develop to solve this issue. The key of the new computing processing is to enable real-time analysis and predictive modeling from the same core. Now with the Hadoop core is still taking up a longer to get questions answered from the big data sets. That´s when Apache Spark came up. Spark is a Large-scale Data Processing Engine, and its associated SQL query tool, Spark SQL, have the fastest interactive query as well as graph services and streaming capabilities. It is keeping the data within Hadoop, but giving enough performance to speed the query result.
10) Deep learning enables computers to recognize items of interest in large quantities of unstructured and binary data, and to deduce relationships without needing specific models or programming instructions. Deep learning are the techniques used on machine-learning based on neural networking, which are still embryonic but demonstrate vast potential for business analysis and queries. A deep learning algorithm doesn’t have to be modeled to understand concepts by itself, for example in data examined from Wikipedia a deep learning algorithm learn by itself the difference and nature of the Texas and California word, and their meanings. All in all, big data will do achieve conclusion with immense and diverse unstructured data information by using advanced analytic techniques like deep learning to benefit us in new ways we weren´t expecting, as face recognition, or of many different kinds of data, such as the shapes, colors and objects in a video, any cognitive engagement, advanced analytics and the conclusions it implies will be the most valuable future leads.
11) Security and protection of data Privacy. To protect your reputation and of your company´s, your platform must encompass rigorous procedures and practices to fulfill the privacy and data protection requirement. The safeguarding all of the data and insights on which your business relies cannot be overestimated in its importance. Data Security becomes more and more important as your organization performs analytics and generates an information value advantage. Safeguarding this infrastructure against internal and external threats it vital in today´s industries.
With so many emerging trends and Technologies being developed and improved around big data and analytics, IT organizations need to create the best conditions to allow analysts and data scientists to experiment, to evaluate, and try prototypes to eventually integrate these inventions in our day to day businesses.
by @alfredosahagun - economist, @englishxspanish translator, Writer, Creator. @institutoidiomas owner teacher @3nglishOnline @SpanishOnline.
Subscribe to:
Post Comments (Atom)
7 TED Talks en Español
1-Cómo dejar de joderte | Mel Robbins 2-El arte de desorientar | Apollo Robbins 3-¿Qué te hace especial? | Mariana Atencio 4-Cambia tus pe...
-
The Augmented Reality Revolution .. is here, and it´s here to stay. Some still are unfamiliar with what will be driving our future world, ou...
-
(off, echo voice) - God Save the Queen!, muffins, sheep, runing hills, fish and chips, margaret the chief! Gallaham brothers, Rowan Atkinson...
No comments:
Post a Comment