Monthly Live Codings
9/1/2020: Intro to Topic Modeling, Elizabeth Piette
In this session we will learn some of the basics of topic modeling to uncover meaning in unstructured texts. We will use Python to process text data, construct features, train a model, visualize the results, and assess model performance. For continuity, we will be using the same data set as Christine's prior introduction to natural language processing. Members of the Harvard community can access this data set on Harvard's GitHub here.
We are working on making the captioned version of this session recording more widely accessible here, but in the meantime, members of the HBS community can access the captioned video at the following link: https://courseware.hbs.edu/video/?v=1_454euf22
8/4/2020: Intro to Natural Language Processing, Christine Rivera
Welcome to Natural Language Processing! In this beginner's demo, we will use Python to walk through some basic NLP steps and demonstrate common techniques for gaining insight into text data. Using Amazon product reviews as our sample data, we will begin with some basic data cleaning, followed by tokenization and the creation of some simple graphs by counting words and tokens. We will then generate a simple word cloud. Finally, we will conduct sentiment analysis using Vader. Members of the Harvard community can access this data set on Harvard's GitHub here.
7/7/2020: Stata Reproducible Reporting (DynDoc) Demo, Leo Hsia
Stata has recently introduced reproducible reporting, which allows users to generate output in Word, Excel, PDF, and HTML that includes formatted text, summary statistics, regression results, and graphs produced by Stata. This is a particularly useful feature if you are doing collaborative research with non-Stata users or need to generate clean and readable reports for a wider audience. This live coding session is an introduction to the basic commands involved in creating reports from Stata.
5/5/2020: Stata Data Frames Demo, Leo Hsia
Stata v16 introduced the concept of frames, ushering in the ability to hold more than one table of data in memory at one time. We briefly review this great feature, and show how it can be used for merging datasets in memory instead of on disk, offering significant time savings.
7/21/2020: Software Modules on the HBSGrid, Bob Freeman
Software modules have arrived on the HBSGrid! The modules provide users with flexible access to multiple versions of software (e.g., Python 2.7, Python 3.6, etc.), software in home folders and project folders, and settings defaults for specific projects. One no longer has to use only the software version that was previously installed on the HBSGrid. Bob Freeman presented this special live demo on how to start using these modules, how these will work in NoMachine, and best practices around their use.
VIDEO COMING SOON