Parsing Reddit Data with Dask

This post is a step-by-step data exploration on a month of Reddit posts. It includes an AWS Amazon Server setup, a Pandas analysis of the Dataset, a castra file setup, then NLP using Dask and then a sentiment analysis of the comments using the LabMT wordlist. Lastly there is a WordCloud setup. All in Python language. Enjoy!

Parsing data using the UNIX terminal

This is an example of how we can do simple data parsing using the Unix terminal. We will use applications like chmod, find, grep, sed, apt-get, nano, cut, sort, uniq, head, tail, less, cat, ssh, wc, echo, man, awk and more.

Visualization of Titanic Disaster using D3

In an attempt to experimentize with the visualization tools (Dimple.js, D3.js, visualization design principles, visual encodings, HTML, CSS, SVG) I created a polished data visualization that tells a story about survival rates of Titanic disaster, allowing a reader to explore trends or patterns.


