> help
Available commands are:
who
get list of organizers and sponsors
agenda
get the schedule of the conference
register
get registration details & form
tail
get post-conference recap
NoSQL East 2009 has come and gone, and we could not be happier with the results. An excellent crew of speakers dropped some serious knowledge on an attentive and eager audience. There were two entire days of talks, social events, mingling, and networking, bringing what we hope was immense value to all involved.
We were overwhelmed with the largely positive feedback we have received. And yet it is the sponsors, speakers, and attendees whom we feel need the praise. The sponsoring companies stepped up huge in assistance, and were all quite relevant to the NoSQL space, adding a ton to the proceedings. We can't say enough about the speakers. While most speakers honored the "more use cases" request, all of them provided compelling, thought-provoking talks. Finally, the attendees shocked us as they turned out in droves. The final total was 120 people, and this really made for great questions, interactions, and side conversations at every turn during our time in Atlanta.
For the curious, yes, we will post slides and videos of the talks on each speaker's page. Get to each speaker's page via the 'agenda' or 'speaker' commands. We will post them as they come online, and tweet updates from
@nosqleast
We would like to hear feedback on every part of the conference. Use our twitter handle or email address to send it along, so we can make next year even better.
Blog Posts
Photos
NoSQL East, a conference of non-relational data stores, aims
to present the experiences of different companies using so-called
“nosql” solutions in production.
The reasons for using these new
systems are varied, but often involve scale that relational
solutions cannot achieve. We have a new tool in our toolbelts,
and as you will see, some people are using them quite effectively.
Join us in Atlanta as we discuss “Big Data” and the systems that
are completely transforming how people look at data.
Wednesday, October 28th
5:00pm
7:00pm
NoSQL East - Cocktail Welcome
sponsored by
Basho Technologies
at
Tap, a gastropub -
Registration is closed for 2009.
>
location
Conference Venue
Georgia Tech Research Institute - Conference Center
250 14th St. NW
Atlanta, GA 30318
Wednesday Cocktail Welcome - 7:00pm
sponsored by
Basho Technologies
Tap, a gastropub
1180 W. Peachtree Street NE
Atlanta, GA 30309
Arin Sarkissian is currently the lead of Digg's Core Infrastructure Team working on the site's scalability, performance & next generation architecture. Previously Arin created Blip.fm & served as their lead developer.
Abstract
Deploying Cassandra in production: I'll be discussing what led Digg to look into NoSQL solutions, why we picked Cassandra, what's currently in production & how we're re-architecting Digg to use Cassandra as our primary data store. Along the way I'll share some lessons learned, metrics, our solutions to random roadblocks as well as various tips 'n tricks.
Video
Slides
Kevin Smith has, at various times, been a network administrator, DBA, developer, team lead and trainer over his 14 year career. He first learned about Erlang in 2006 via Joe Armstrong's excellent book and has never looked back. Kevin is the founder of Hypothetical Labs, a consultancy focused on helping software teams learn about and adopt Erlang for maximum success. He is also the author of the popular screencast series Erlang In Practice.
Abstract
Sporting a simple API and a command syntax which closely resembles HTTP, Redis is the Swiss army knife of storage engines. In this presentation I'll give a short introduction to Redis, and discuss ways Redis can be used today in your application's infrastructure.
Video
Slides
Kevin Weil leads the analytics team at Twitter, building distributed infrastructure and leveraging data analysis at a massive scale to help grow the popular micro-blogging service. With millions of monthly site visitors and many more interacting through API-based third party applications, Twitter has one of the world's most varied and interesting datasets. Prior to joining Twitter, Kevin led the analytics team at the Kleiner Perkins-backed web media startup Cooliris. Kevin earned his bachelor's degree in Mathematics and Physics from Harvard University, and has a master's degree in Physics from Stanford University.
Abstract
Massive growth in the size of business datasets leads many companies to Hadoop, an emerging architecture for parallel data processing. However, the migration path can be challenging, in part because MapReduce analyses use programming languages like Java and Python rather than SQL. Apache Pig is a high-level framework built on top of Hadoop that offers a powerful yet vastly simplified way to analyze data in Hadoop. It allows businesses to leverage the power of Hadoop in a simple language readily learnable by anyone that understands SQL. In this presentation, I will introduce Pig and show how it's been used at Twitter to solve numerous analytics challenges that became intractable with our former MySQL-based architecture.
Video
Slides
Chris Curtin is the CTO of Silverpop. Chris Curtin brings
more than 19 years of experience leading the design and
development of large-scale, mission-critical, distributed
business applications. Previously, he was the director of
engineering for Bradley Ward Systems Inc., where he
developed and implemented manufacturing applications for
the food industry and was lead architect at Manhattan
Associates. Curtin graduated from the University of
Massachusetts with a Bachelor of Science degree in
computer science.
Abstract
Map/Reduce, as provided in Apache’s Hadoop, is simple in
concept, but difficult in practice. Many real world tasks
require multiple map/reduce tasks and sometimes figuring
out which step to perform an operation in can be a
challenge.
Cascading
is an open source
abstraction layer on top of Hadoop that provides an easy
to use, but powerful API that figures out what problems
you are trying to solve and builds the appropriate map,
reduce and dependency logic. In this presentation I will
be explaining the concepts of Cascading and showing how it
is used to solve real world problems.
Video
Slides
John Willis has worked in the IT management industry for 30 years. He started as a tape operator on an IBM mainframe while working for his high school computer club, and began his professional career at Exxon as an IT infrastructure analyst. He is the founder of four successful startups over the past 20 years and is currently the CEO of his self-funded Zabovo Corp. Willis is known internationally for his IT Management and Cloud blog. He also has two podcast series on clouds called 'Cloud Cafe' and 'Cloud Droplets'. Willis is also the co-host of Redmonk's 'IT Management Guys' podcast series.
Abstract
Cambrian Explosion
Video
Slides
Matt Arrott, e-Science Program Manager for the California Institute of Telecommunication and Information Technology at University of California San Diego and CTO for TWIST Process Innovations. Mr. Arrott has over 20 years experience in project management, design leadership, and engineering management for software and network systems focused on information design and dissemination. His current focus is on the standardization of service interaction patterns as the basis for engineering scalable community articulated service clouds. Recently, much of his time is spent developing the cyber infrastructure strategy for the Ocean Observatories Initiative, a global multi scale instrumentation of the world¹s oceans for in situ scientific exploration. Past positions held with, Currenex, as VP of Product Development delivered the industry¹s first multi-bank "Executable Streaming Price" product distributing millions of real time pricing events daily across financial institutions globally; Dreamworks SKG, as Head of Software R&D; Autodesk, as Graphics System Development Manager; and the National Center for Supercomputing Applications, as scientific visualization architect.
Abstract
Oceans of Data. A haiku.
John Day has been involved in research and development of computer networks since 1970 (his original network address was 12) when he was involved in the design of transport and upper layer protocols for the ARPANet as well as the Internet. Mr. Day has developed and designed protocols for everything from the data link layer to the application layer. Recently Mr. Day has been turning his attention to radically new network architectures that scale indefinitely as described in his recent book Patterns in Network Architecture: A Return to Fundamentals.
Abstract
Oceans of Data. A haiku.
Tony Garnock-Jones
Abstract
Oceans of Data. A haiku.
Kyle Banker works at 10gen, where he maintains the MongoDB Ruby Driver and supports the Ruby developer community. Previously, Kyle built e-commerce and social networking applications at Alexander Interactive, and wishes he could have modeled his data as documents at the time. In a past life, Kyle thrived as a languages nerd and taught English literature; he hopes you'll find his presentation pedagogically sound.
Abstract
MongoDB is a high-performance, schema-free, document-oriented database built for scalability. Kyle will briefly highlight MongoDB's features, focusing on document indexing, dynamic querying, and large-object storage. Following that, Kyle will address document-oriented data modeling in MongoDB, discuss how the database is used in production at Business Insider, and suggest ideal use cases for MongoDB.
Video
Slides
Mike Miller is a co-founder of Cloudant, a company commercially developing Apache CouchDB and a sponsor of NoSQL East. Mike enjoys putting algorithms in the wild, solving problems for the first time, and building things. He honed his computing repertoire while studying the universe’s fundamental particles and interactions as a physicist, most recently at the Large Hadron Collider where he cut his teeth on Petabyte per second problems.
Abstract
Apache CouchdDB is a schema-free document database with unique characteristics that make it an excellent fit for a new wave of applications. It is a rapidly maturing product and one of the leaders of the NoSQL movement. I will briefly highlight the defining characteristics of CouchDB (REST API, JSON key/value store, MapReduce views, replication/conflict-resolution), focusing on solving problems for both application development and application deployment. I will then highlight lessons learned from running CouchDB at scale in production systems with an eye towards future development of the CouchDB project.
Video
Slides
Cliff is a builder of and frequent speaker on high-performance, scalable web applications. He is the lead engineer for front-end systems at Powerset where he was instrumental in the design, implementation, launch, and operation of many of the company's production services. Cliff is an active contributor to open source projects, and is a highly-regarded member of the Erlang community. He was inspired by the publication of the Amazon Dynamo paper to implement, and release as open source, Dynomite, a robust, distributed key/value store written in Erlang, currently in production use at Powerset.
Abstract
Dynomite and the future of Distributed Databases
So what's the deal with Dynomite? Patches are not being accepted and development has crawled to a stop. Cliff will give everyone the inside scoop on what is happening with the Dynomite project and its current status vis open source. He will also discuss some ideas for the future of distributed databases.
Video
note: Audio levels are low until 1:40
Slides
Justin Sheehy is the CTO of Basho Technologies, the company behind the creation of Webmachine and Riak. Before Basho, Justin worked on distributed systems, resilient networks, and high assurance at MITRE and Akamai.
Abstract
Justin will introduce Riak, a decentralized data storage system. Riak has a convenient HTTP/JSON interface and map/reduce programming atop a networking model that is focused on write-availability and ease of operations. The talk will go through some difficult real-world situations that have occurred on applications using Riak, and how Riak helped make those situations manageable.
Video
Slides
Mark has been a software developer for over 11 years, working in diverse industries and technologies. He holds an undergraduate degree from Emory University and has done graduate work towards a Masters of Business Administration with a specialization in a Masters of Decision Sciences. Mark is a founding member of Catamorphic Labs, LLC, which specializes in mining massively large data-sets, semantic web technologies, and software development using emerging technologies.
Abstract
Don't have the time to scale or shard MySQL? Don't have the GDP of a small nation to spend on Oracle? Riding on top of Hadoop's Distributed File Store and providing means to easily interact with Hadoop's MapReduce framework, HBase provides an inexpensive, simple, and scalable solution to persisting and processing massively large sets of data. This talk will introduce HBase's data model and API and then explore several cases where its use solved problems quickly and easily.
Tim Anglade is CTO of GemKitty LLC, a Portland-based jewelry start-up. He spends the rest of his days as a gun-for-hire extraordinaire or teaching Project Management at a university near Paris, France. He was previously Department Director for a Franco-American Web Agency and a consultant in charge of the Market Replay project at the Nasdaq Stock Exchange. He holds an MSc from the National Institute of Applied Sciences in Lyon and is an Invited Expert at the W3C.
Abstract
tin is a database engine so tiny, its name had to be shortened. More specifically, tin focuses on storage, retrieval, subscription and transformation of sequential data. Like many other "NoSQL" solutions, it is not meant as a general-use RDBMS replacement but finds all its strength when handling data such as Facebook walls, Twitter feeds or any multi-dimensional data that is primarily served or ordered along a single dimension (or single set of dimensions). The approach at the core of tin was first tested on financial data, in the back-end delivery service behind Nasdaq's Market Replay --- where it was able to serve petabytes worth of stock prices daily without breaking a sweat (or the piggy bank).
This talk will cover:
- the reasons that led to the conception of such a tool;
- how tin and other "text & filesystem" DBMS can be architected;
- how to reduce some data conundrums to 1-dimensional problems;
- performance gains of such approaches (the usual charts & graphs pr0n)
- and as a bonus (if time allows): "lens-based" transformation systems like the one used inside tin.
Video
Slides
Founder of the Neo4j graph database project and CEO of Neo Technology. Programmer by passion the first 15 years on this planet and by passion & profession the remaining 15. First free software project at age 16. Now mainly focused on spreading the word about the powers of graphs and preaching the demise of tabular solutions everywhere. Presents regularly at conferences such as JAOO, Oredev, QCon, and OSCON.
Abstract
Many applications today handle data that is deeply associative, i.e. structured as graphs (networks). The most obvious example of this is social networking sites, but even tagging systems, content management systems and wikis deal with inherently hierarchical or graph-shaped data.
This turns out to be a problem because it is difficult to deal with recursive data structures in traditional relational databases and many NoSQL stores alike. For example, in an RDBMS each traversal along a link in a graph is a join, and joins are known to be very expensive.
A graph database uses nodes, relationships between nodes and key-value properties instead of tables to represent information. This model is typically substantially faster for associative data sets and uses a schema-less, bottoms-up model that is ideal for capturing ad-hoc and rapidly changing data.
This session will introduce an open source, high-performance, transactional and disk-based graph database called "Neo4j" (http://neo4j.org), which frequently outperforms relational backends with >1000x for graph-shaped data.
Video
Slides
Geir Magnusson Jr, VP, Platform and Architecture, Gilt Groupe - Geir brings to Gilt his interest in scalable systems, software engineering and craftsmanship, and open source software. He's served as a technical executive and leader for companies such as 10gen, Joost, Adeptra, Bloomberg and Intel, and has built systems and solutions for industries as ranging across financial markets to fraud contact to digital audio. He also has broad experience in open source, having founded several significant open source projects, such as Apache Geronimo, Apache Harmony and Apache Velocity. A member of the Apache Software Foundation, he's represented the Foundation as a member of the Executive Committee of the Java Community Process and is a past and current member of the Board of Directors. He's also an international speaker on open source and software technology.
Abstract
Project Voldemort is an open-source data persistence technology that is accurately described as a "big, distributed, persistent, fault-tolerant hash table". It's not a relational database. It doesn't do ACID. It doesn't care about your object or your documents. But it will store data and get return it to you even when parts of it go away, and it will do it rather fast. This talk covers the basic theory of operation of Project Voldemort, and describe how Gilt Groupe is using Project Voldemort at the heart of their e-commerce transaction processing system.
Yuan Yu is a senior researcher at Microsoft Research Silicon Valley lab, where he currently works on systems and programming models for large-scale parallel and distributed computing. He is the project leader and primary developer of DryadLINQ. He joined Microsoft Research in 2002. Previously, he was a senior member of technical staff at the DEC/Compaq Systems Research Center. He has a PhD in Applied Mathematics from University of Texas at Austin.
Abstract
A DryadLINQ program is a sequential program (written in C#, VB, or F#) composed of LINQ queries performing arbitrary side effect-free transformations on datasets, and can be written and debugged using standard .NET development tools. The DryadLINQ system automatically and transparently translates the LINQ queries in the program into distributed execution plans and executes them on large compute clusters using the Dryad execution engine. In this talk, I will describe the programming model, the design and implementation, and applications of the systems.
Video
Slides
John Corwin is a senior engineer in Yahoo!'s Cloud Computing group in Sunnyvale, CA, where he works on the Sherpa distributed database platform. Before joining Yahoo!, John attended graduate school at Yale where he did research on database systems. Prior to that, John was a member of IBM's Advanced Internet Technology group in Cambridge, MA.
Abstract
Sherpa is Yahoo!'s next-generation structured-record distributed storage service that addresses growing scalability needs of Yahoo!'s properties. Important features of Sherpa include high scalability, elastic growth, a global footprint, and low-latency access from anywhere. One key difference between Sherpa and other distributed database systems is Sherpa's support for different consistency models, allowing the application developer to manage the trade-off between consistency, availability, and performance. In this talk, I will give an overview of the Sherpa architecture and describe how various web serving application can use Sherpa to solve their data storage problems.