The Emotional Rollercoaster of Scraping: My First Ruby CLI Gem

Andrew Smoker
3 min readNov 12, 2020

--

It’s hard to believe I am already in my 4th week of the Flatiron School’s Software Engineering Program, but I have arrived at my first project week!

Going into this week, I was extremely nervous. I felt confident in my knowledge up to this point, but starting from a blank slate terrified me. The project requirements included using data from an outside source so the first thing I needed to do was decide on a topic. I combed through many API options, but wasn’t able to find anything that really got me excited. So I decided (for better or worse) to attempt scraping a website for data.

I tested a few different websites and ultimately felt comfortable with scraping the Playbill website for content about Broadway Shows. I had big dreams of what this app would do and set forth on my mission from there.

Once I had all of my files set up and organized for my project (which was quite a process), I decided scraping the data would be my first priority. I wanted to make sure that I was able to get actual data and pull it into my app. From there, I felt confident that I would be able to use the data in a way to get my program working.

I went to the initial page I wanted to scrape and attempted to pull in the titles of the shows. This was what I had tested earlier, so very quickly I had a list of 40 Broadway shows in my terminal. Woot! Little did I know that would be the only easy part about scraping the Playbill website. Unfortunately as I got further into the data on the main page, I realized there were a lot of inconsistencies in how they were classifying and grouping information which made it almost impossible to get a clean scrape. Some musicals were showing up as plays, addresses were formatted differently across the site, synopsis data was tied to the COVID-19 response, etc..

I freaked out and went into panic mode. Will I have to start my project all over? Will I not have any information to display in this app if I continue? Why did I choose scraping when I could have imported information from an API? I thought it might be the end, but I really didn’t want to start over. So I took a deep breath and kept digging.

I looked into the individual show pages and realized there was information I thought I’d be able to access from those pages. So I created two scraping methods, one for the total list of shows and one for each individual show page. The major downfall of this was that it would take longer for my app to load, but I decided having the right information was more important.

Scraping is hard, but I found a repl.it that was extremely helpful and basically provided a playground to scrape data and see the results immediately: https://repl.it/@TheGingertonic/ScraperChecker#main.rb.
I highly recommend using it to test scraping different websites. I still had to poke around on Playbill for a long time to get what I actually wanted, and even once I had it, I needed to find creative ways to format all of the data in a consistent and usable manor.

Once I had the data, the rest of the setup was fairly straightforward. I was able to use what I learned in the labs and build upon that. I’m very happy with how my project turned out and I think it’s something I would actually use when Broadway finally opens again (post COVID)! Overall, I’m glad I chose to scrape (even though it had its roadblocks). I was challenged in a new way that pushed me to problem solve. And isn’t that what we as developers do? 😎

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Andrew Smoker
Andrew Smoker

Written by Andrew Smoker

I am 34 years old and making a huge career change by attending Flatiron School’s Software Engineering Bootcamp. Excited to learn!

No responses yet

Write a response