4.23文创礼盒,买2个减5元 读书月福利
欢迎光临中图网 请 | 注册

流式系统

出版社:东南大学出版社出版时间:2019-06-01
开本: 24cm 页数: 18,329页
中 图 价:¥96.0(7.5折) 定价  ¥128.0 登录后可看到会员价
加入购物车 收藏
运费6元,满69元免运费
?快递不能达地区使用邮政小包,运费14元起
云南、广西、海南、新疆、青海、西藏六省,部分地区快递不可达
本类五星书更多>

流式系统 版权信息

流式系统 本书特色

在传统的数据处理流程中,总是先收集数据,然后将数据放到DB中。当人们需要的时候通过DB对数据做query,得到答案或进行相关的处理。这样看起来虽然非常合理,但是结果却非常的紧凑,尤其是在一些实时搜索应用环境中的某些具体问题,类似于MapReduce方式的离线处理并不能很好地解决问题。这就引出了一种新的数据计算结构---流计算方式。它可以很好地对大规模流动数据在不断变化的运动过程中实时地进行分析,捕捉到可能有用的信息,并把结果发送到下一计算节点。本书讲解流计算原理。

流式系统 内容简介

如今,流式数据是大数据中的一个大问题。 随着越来越多的企业试图掌控遍布全球的无限海量数据集,流式系统终于到了足以被主流接纳的成熟度。通过这本实用指南,数据工程师、数据科学家和开发人员将学习到如何以概念化和无关于平台的方式处理流式数据。基于对Tyler Akidau的热门博文《Streaming 101》和《Streaming 102》的拓展,本书将带你从入门到细致入微地理解实时数据流处理的what、where、when和how。你还将与合著者Slava Chernyak和Reuven Lax一起深入了解水印和exactly-once处理。
你将学习到:如何比较流式和批量数据处理模式健全的乱序数据处理背后的核心原理和概念水印如何在无限数据集中跟踪进度和完整性exactly-once数据处理技术如何确保正确性流和表的概念如何构成批量和流式数据处理的基础用现实世界的例子演示强大的持久状态机制背后的实用动机时变关系(time-varying relations)如何将流处理和熟悉的SQL及关系代数世界联系起来

流式系统 目录

Preface Or: What Are You Getting Yourself Into Here? Part Ⅰ.The Beam Model 1.Streaming 101 Terminology: What Is Streaming? On the Greatly Exaggerated Limitations of Streaming Event Time Versus Processing Time Data Processing Patterns Bounded Data Unbounded Data: Batch Unbounded Data: Streaming Summary 2.The What, Where, When, and How of Data Processing Roadmap Batch Foundations: What and Where What: Transformations Where: Windowing Going Streaming: When and How When: The Wonderful Thing About Triggers Is Triggers Are Wonderful Things! When: Watermarks When: Early/On-Time~Late Triggers FTWI When: Allowed Lateness (i.e., Garbage Collection How: Accumulation Summary 3.Watermarks Definition Source Watermark Creation Perfect Watermark Creation Heuristic Watermark Creation Watermark Propagation Understanding Watermark Propagation Watermark Propagation and Output Timestamps The Tricky Case of Overlapping Windows Percentile Watermarks Processing-Time Watermarks Case Studies Case Study: Watermarks in Google Cloud Dataflow Case Study: Watermarks in Apache Flink Case Study: Source Watermarks for Google Cloud Pub/Sub Summary 4.Advanced Windowing When/Where: Processing-Time Windows Event-Time Windowing Processing-Time Windowing via Triggers Processing-Time Windowing via Ingress Time Where: Session Windows Where: Custom Windowing Variations on Fixed Windows Variations on Session Windows One Size Does Not Fit All Summary 5.Exactly-Once and Side Effects Why Exactly Once Matters Accuracy Versus Completeness Side Effects Problem Definition Ensuring Exactly Once in Shuffle Addressing Determinism Performance Graph Optimization Bloom Filters Garbage Collection Exactly Once in Sources Exactly Once in Sinks Use Cases Example Source: Cloud Pub/Sub Example Sink: Files Example Sink: Google BigQuery Other Systems Apache Spark Streaming Apache Flink Summary Part Ⅱ.Streams and Tables 6.Streams and Tables Stream-and-Table Basics Or: a Special Theory of Stream and Table Relativity Toward a General Theory of Stream and Table Relativity Batch Processing Versus Streams and Tables A Streams and Tables Analysis of MapReduce Reconciling with Batch Processing What, Where, When, and How in a Streams and Tables World What: Transformations Where: Windowing When: Triggers How: Accumulation A Holistic View Of Streams and Tables in the Beam Model A General Theory of Stream and Table Relativity Summary 7.The Practicalities of Persistent State Motivation The Inevitability of Failure Correctness and Efficiency Implicit State Raw Grouping Incremental Combining Generalized State Case Study: Conversion Attribution Conversion Attribution with Apache Beam Summary 8.Streaming SQL What Is Streaming SQL? Relational Algebra Time-Varying Relations Streams and Tables Looking Backward: Stream and Table Biases The Beam Model: A Stream-Biased Approach The SQL Model: A Table-Biased Approach Looking Forward: Toward Robust Streaming SQL Stream and Table Selection Temporal Operators Summary 9.Streaming Joins All Your loins Are Belong to Streaming Unwindowed loins FULL OUTER LEFT OUTER RIGHT OUTER INNER ANTI SEMI Windowed loins Fixed Windows Temporal Validity Summary 10.The Evolution of Large-Scale Data Processing MapReduce Hadoop Flume Storm Spark MillWheel Kafka Cloud Dataflow Flink Beam Summary Index
展开全部

流式系统 作者简介

Tyler Akidau是Google的高级软件工程师,担任着Data Processing Languages & Systems小组技术负责人的职务。他也是Apache Beam PMC的创始成员。 Slava Chernyak是Google的高级软件工程师。他花了六年时间研究Google内部的大规模流式数据处理系统。 Reuven Lax是Google的高级软件工程师,在过去十年间一直在帮助制定Google的数据处理和分析策略,同时他也是Apache Beam PMC的成员。

商品评论(0条)
暂无评论……
书友推荐
编辑推荐
返回顶部
中图网
在线客服