Cassandra High Availability by Robbie Strickland

By Robbie Strickland

Apache Cassandra is a vastly scalable, peer-to-peer database designed for one hundred pc uptime, with deployments within the tens of hundreds of thousands of nodes assisting petabytes of data. This ebook bargains readers a realistic perception into development hugely to be had, real-world functions utilizing Apache Cassandra. 

The ebook begins with the basics, aiding you to appreciate how the structure of Apache Cassandra permits it to accomplish one hundred pc uptime while different structures fight to take action. you should have a good realizing of knowledge distribution, replication, and Cassandra's hugely tunable consistency version. this can be by way of an in-depth examine Cassandra's strong aid for a number of info facilities, and the way to scale out a cluster. subsequent, the ebook explores the area of program layout, with chapters discussing the local driving force and knowledge modeling. finally, you can find out the right way to avoid universal antipatterns and reap the benefits of Cassandra's skill to fail gracefully.

What you'll learn:

  • Understand how the center structure of Cassandra allows hugely to be had applications
  • Use replication and tunable consistency degrees to stability consistency, availability, and performance
  • Set up a number of info facilities to allow failover, load balancing, and geographic distribution
  • Add potential for your cluster with 0 down time
  • Take benefit of excessive availability good points within the local driver
  • Create information versions that scale good and maximize availability
  • Understand universal anti-patterns so that you can keep away from them
  • Keep your approach operating good even in the course of failure scenarios

Show description

Read or Download Cassandra High Availability PDF

Similar data mining books

Data Structures and Algorithms (Software Engineering and Knowledge Engineering, 13)

This is often a great, updated and easy-to-use textual content on info buildings and algorithms that's meant for undergraduates in desktop technological know-how and knowledge technology. The 13 chapters, written via a world workforce of skilled lecturers, conceal the elemental strategies of algorithms and many of the vital information buildings in addition to the concept that of interface layout.

A Course in In-Memory Data Management: The Inner Mechanics of In-Memory Databases

Fresh achievements in and software program improvement, similar to multi-core CPUs and DRAM capacities of a number of terabytes in keeping with server, enabled the creation of a innovative expertise: in-memory facts administration. This know-how helps the versatile and intensely quickly research of huge quantities of firm information.

Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2014, Nancy, France, September 15-19, 2014. Proceedings, Part I (Lecture Notes in Computer Science)

This three-volume set LNAI 8724, 8725 and 8726 constitutes the refereed complaints of the eu convention on computing device studying and data Discovery in Databases: ECML PKDD 2014, held in Nancy, France, in September 2014. The a hundred and fifteen revised learn papers provided including thirteen demo song papers, 10 nectar tune papers, eight PhD music papers, and nine invited talks have been conscientiously reviewed and chosen from 550 submissions.

Learning to Love Data Science: Explorations of Emerging Technologies and Platforms for Predictive Analytics, Machine Learning, Digital Manufacturing and Supply Chain Optimization

Until eventually lately, many folks notion immense information was once a passing fad. "Data technological know-how" used to be an enigmatic time period. this day, gigantic info is taken heavily, and information technological know-how is taken into account downright horny. With this anthology of news from award-winning journalist Mike Barlow, you’ll get pleasure from how information technological know-how is essentially changing our global, for larger and for worse.

Additional info for Cassandra High Availability

Example text

As you learned in Chapter 1, Cassandra’s Approach to High Availability, Cassandra employs a sophisticated replication system that allows fine-grained control over replica placement and consistency guarantees. You’ll be able to intelligently choose options that will provide the fault tolerance and consistency guarantees that are appropriate for your application. We’ll begin the discussion with a feature that you’ll encounter the very first time you create a keyspace: the replication factor. Let’s start with the basic mechanics of setting the replication factor.

Chapters 3, Replication, and Chapter 4, Data Centers, will provide a complete discussion of Cassandra’s extensive replication features. However, as previously discussed, consistency should be thought of as a continuum, not as an absolute. This gives the application architect ultimate control over the trade-offs between consistency, availability, and performance at the call level—rather than forcing a one-size-fits-all strategy onto every use case. The CAP acronym refers to three desirable properties in a replicated system: Consistency: This means that the data should appear identical across all nodes in the cluster Availability: This means that the system should always be available to receive requests Partition tolerance: This means that the system should continue to function in the event of a partial failure In 2000, computer scientist Eric Brewer from the University of California, Berkeley, posited that a replicated service can choose only two of the three properties for any given operation.

However, this decision must be carefully weighed as there is a high likelihood that you’ll end up with hotspots. If we presume that both reads and writes follow the same distribution as the data itself (which is a logical assumption in this specific case), the heavier data nodes will also be required to handle more operations than the lighter data nodes. In fact, two of the nodes own almost no data at all. ”. Thus it’s a common mistake to build a time-series model using time as a key, and rely on ordering from the ByteOrderedPartitioner to perform range queries.

Download PDF sample

Rated 4.20 of 5 – based on 47 votes