
Sách gia công
bấm vào để đọc thêm
Thể loại:Computers - Other
Năm:2022
Nhà xuát bản:O'Reilly Media, Inc.
Ngôn ngữ:english
Trang:275
The amount of data being generated today is
staggering--and growing. Apache Spark has emerged as the de facto tool
to analyze big data and is now a critical part of the data science
toolbox. Updated for Spark 3.0, this practical guide brings together
Spark, statistical methods, and real-world datasets to teach you how to
approach analytics problems using PySpark, Spark's Python API, and other
best practices in Spark programming.
Data scientists Akash
Tandon, Sandy Ryza, Uri Laserson, Sean Owen, and Josh Wills offer an
introduction to the Spark ecosystem, then dive into patterns that apply
common techniques--including classification, clustering, collaborative
filtering, and anomaly detection--to fields such as genomics, security,
and finance. This updated edition also covers NLP and image processing.
If
you have a basic understanding of machine learning and statistics and
you program in Python, this book will get you started with large-scale
data analysis.
Familiarize yourself with Spark's programming model and ecosystem
Learn general approaches in data science
Examine complete implementations that analyze large public datasets
Discover which machine learning tools make sense for particular problems
Explore code that can be adapted to many uses