By Frank Kane

Key Features

  • Understand how Spark should be disbursed throughout computing clusters
  • Develop and run Spark jobs successfully utilizing Python
  • A hands-on educational via Frank Kane with over 15 real-world examples educating you immense information processing with Spark

Book Description

Frank Kane's Taming colossal information with Apache Spark and Python is your spouse to studying Apache Spark in a hands-on demeanour. Frank will commence you off by means of educating you ways to establish Spark on a unmarried method or on a cluster, and you may quickly stream directly to reading huge facts units utilizing Spark RDD, and constructing and operating powerful Spark jobs speedy utilizing Python.

Apache Spark has emerged because the subsequent giant factor within the colossal information area – speedy emerging from an ascending expertise to a longtime celebrity in precisely an issue of years. Spark lets you quick extract actionable insights from quite a lot of info, on a real-time foundation, making it a vital instrument in lots of glossy businesses.

Frank has packed this e-book with over 15 interactive, fun-filled examples correct to the true international, and he'll empower you to appreciate the Spark environment and enforce production-grade real-time Spark initiatives with ease.

What you are going to learn

  • Find out how one can establish large information difficulties as Spark problems
  • Install and run Apache Spark in your machine or on a cluster
  • Analyze huge facts units throughout many CPUs utilizing Spark's Resilient allotted Datasets
  • Implement computing device studying on Spark utilizing the MLlib library
  • Process non-stop streams of information in genuine time utilizing the Spark streaming module
  • Perform complicated community research utilizing Spark's GraphX library
  • Use Amazon's Elastic MapReduce provider to run your Spark jobs on a cluster

About the Author

My identify is Frank Kane. I spent 9 years at Amazon and IMDb, wrangling hundreds of thousands of purchaser scores and shopper transactions to supply issues akin to custom-made concepts for videos and items and "people who received this additionally bought." I inform you, I want we had Apache Spark again then, whilst I spent years attempting to remedy those difficulties there. I carry 17 issued patents within the fields of disbursed computing, facts mining, and laptop studying. In 2012, I left to begin my very own winning corporation, Sundog software program, which makes a speciality of digital truth setting know-how, and educating others approximately vast info analysis.

Table of Contents

  1. Getting all started with Spark
  2. Spark fundamentals and straightforward Examples
  3. Advanced Examples of Spark Programs
  4. Running Spark on a Cluster
  5. SparkSQL, Dataframes and Datasets
  6. Other Spark applied sciences and Libraries
  7. Where to head From right here? - studying extra approximately Spark and information Science

Show description

Read Online or Download Frank Kane's Taming Big Data with Apache Spark and Python PDF

Best data mining books

Robust Data Mining (SpringerBriefs in Optimization)

Information uncertainty is an idea heavily similar with so much genuine lifestyles purposes that contain facts assortment and interpretation. Examples are available in info bought with biomedical tools or different experimental options. Integration of strong optimization within the present facts mining innovations target to create new algorithms resilient to errors and noise.

Data Mining Mobile Devices

With today’s shoppers spending extra time on their mobiles than on their desktops, new equipment of empirical stochastic modeling have emerged which can offer sellers with distinctive information regarding the goods, content material, and providers their consumers wish. information Mining cellular units defines the gathering of machine-sensed environmental info bearing on human social habit.

Information Security Analytics: Finding Security Insights, Patterns, and Anomalies in Big Data

Details protection Analytics provides insights into the perform of analytics and, extra importantly, how one can make the most of analytic thoughts to spot tendencies and outliers that won't be attainable to spot utilizing conventional protection research options. info protection Analytics dispels the parable that analytics in the details safety area is proscribed to simply protection incident and occasion administration structures and simple community research.

Big Data Analytics Using Multiple Criteria Decision-Making Models (Operations Research Series)

A number of standards choice Making (MCDM) is a subfield of Operations learn, facing choice making difficulties. A decision-making challenge is characterised by way of the necessity to opt for one or a couple of between a couple of choices. the sphere of MCDM assumes distinct value during this period of massive info and company Analytics.

Extra info for Frank Kane's Taming Big Data with Apache Spark and Python

Example text

Download PDF sample

Rated 4.31 of 5 – based on 20 votes