SEA Class and Hands-on Workshop: Spark and TensorFlow

Neal McBurnett, Independent Consultant in Data Science and Election Integrity

Thursday June 22 2017

Friday June 23 2017

Shared folder for SEA Class and Hands-on Workshop: Spark and TensorFlow

Exploring further on your own

To continue to explore Spark on your own, you have many options. You can use the Cheyenne or Yellowstone environments you set up during the class. You can also run Spark on your own laptop or desktop computers by downloading Apache Spark, or via Pyspark :: Anaconda Cloud, both of which are pretty easy. And of course starting up Jupyter notebooks is a lot easier on a local environment.

The 2015 edX MOOC courses that I talked about are no longer available via edX, but can be found at videos and notebooks from the 2015 "Introduction to Big Data" and "Scalable Machine Learning" courses. They are of course dated, but high quality, and include a fun but challenging PCA analysis of neural activity in a larval Zebrafish brain.

Another option, which also gets you access to a more good-quality training materials, is Databricks Community Edition. The Community Edition provids a convenient online web service to start up a free, but very small, cluster (6 GB). It has a nice intro to Spark, and an "Apache Spark on Databricks for Data Scientists" notebook that covers some Machine Learning basics. See also their "Analyzing 1000 Genomes with Spark and Hail" notebook, for more science on Spark. You can also run the MOOC class notebooks for free there.

You can import notebooks to Databricks, like the Introducing Deep Learning Pipelines for Apache Spark Spark + Tensorflow example I briefly showed you.

If you want to try bigger clusters, you can get a free 14-day trial of the full Databricks environment, which you hook up to an Amazon AWS account.

Thanks again for coming to learn about Spark and TensorFlow. Happy computing! --Neal McBurnett