This post is a step-by-step data exploration on a month of Reddit posts. It includes an AWS Amazon Server setup, a Pandas analysis of the Dataset, a castra file setup, then NLP using Dask and then a sentiment analysis of the comments using the LabMT wordlist. Lastly there is a WordCloud setup. All in Python language. Enjoy!
In this post I describe how to implement the DBSCAN clustering algorithm to work with Jaccard-distance as its metric. It should be able to handle sparse data.
This is an example of how we can do simple data parsing using the Unix terminal. We will use applications like chmod, find, grep, sed, apt-get, nano, cut, sort, uniq, head, tail, less, cat, ssh, wc, echo, man, awk and more.
In an attempt to experimentize with the visualization tools (Dimple.js, D3.js, visualization design principles, visual encodings, HTML, CSS, SVG) I created a polished data visualization that tells a story about survival rates of Titanic disaster, allowing a reader to explore trends or patterns.