Sentiment Analysis and Web Charts with Python

I’m mostly retired, but still consult a bit in my field: Intelligent Transportation Systems (the use of advanced sensors, communications and information processing systems to improve ground transportation). The field includes connected and automated vehicles. And out of intellectual curiosity, I’ve also been learning about sentiment analysis. So I decided to combine these interests by tracking the change in sentiment over time, about automated vehicles, using tweets as the source material. And while I was at it, I figured I’d do the same thing for electric vehicles, and publicly display the results on two web pages.

One of the common, basic approaches to sentiment analysis is to determine whether the general tone of a sample of text is positive, negative, or neutral. That is the approach that I’ve taken. To determine this, one can use a rules-based approach, where certain words and word combinations are assigned tones and intensity (e.g., “hate” is very negative, “cool” is positive, “not cool” is negative). The scores for each sentiment word are combined to get an overall sentiment for the text sample. The other approach is to use machine learning algorithms to determine the sentiment. This approach can be more flexible, but requires a lot of training data, and the training data must be tagged by humans. In addition, multiple humans should review each sample, as studies have shown that humans can disagree with one another 20% of the time when assigning sentiment to a text sample.

I decided to use tweets posted on Twitter as the source material, since there are a large number of new entries every day and Twitter provides a free API that allows one to sample thousands of tweets per day, filtered by keywords and other parameters.

Once each tweet is assigned a sentiment value, I look at several metrics based on the total numbers of positive, negative, and neutral tweets. In addition, I extract and save the 100 most common 1 word and 2-word combinations found in the set of tweets for the day. The results are then used to generate several metrics, plots, and displays for the web pages.

The overall system for data collection, analysis, and display is shown…

Swim diagram for this project analyzing the sentiment of tweets about AVs and EVs and publishing the results on a web page

Getting and Filtering Tweets

The first step is to get the relevant tweets and clean and filter them. Two almost identical programs are used, one for AV tweets and one for EV tweets. The reason these are not combined into a single program is that this would exceed the 15 minute access limit of Twitter’s API. Instead, each is run once, about an hour apart, during the middle of the night eastern standard time.

Twitter’s free “Essentials” access level lets one search through a sample of tweets that were posted in the last 7 days. In order to access the twitter API, one must first register. Then, I used tweepy, a free python library for accessing the Twitter API. The search query API provides for a large number of query parameters. For this project, I searched for tweets containing any of a list of key words and that were not retweets, replies, or quote tweets (i.e., only original tweets are included).

The search terms were selected to try to broadly collect relevant tweets, but to avoid accidentally capturing a significant number of irrelevant ones. For example, the AV search terms include “self driving,” “automated vehicles,” and “automated shuttles” but does not include “Tesla,” as many tweets with the word Tesla are about the stock price or other aspects of the company, not about autonomous vehicles.

The next steps filtered out empty tweets and remove URLs from the tweet contents. During development, manual inspection revealed a large number of identical or nearly identical tweets, mostly originating from bots that pick up and tweet news stories. Therefore a filter was set up to keep only one sample from these duplicates and near-duplicates. To remove exact duplicates I simply used the drop_duplicates method in pandas. Removing near duplicates proved more problematic. The basic approach is to assess the similarity between pairs of tweets, and if they are sufficiently similar, remove one of them. Unfortunately this involves comparing every tweet with every other tweet, and there are thousands of tweets. I found that efficiently iterating through the tweets required the use of pandas’ itertuples method. This was at least an order of magnitude faster than using iterrows or items methods. Even so, the first similarly library I tried to use, SequenceMatcher, took over 10 minutes to perform all the comparisons! In the end, I used the Levenshtein.normalized_similarity method from the RapidFuzz library. This brought the runtime down to seconds.

Sentiment Analysis

The filtered and cleaned up tweets were now each analyzed to determine the sentiment of the tweets. I experimented with a number of approaches and settled on using VADER, an open-sourced, rule-based tool for sentiment analysis, written in Python. VADER was specifically developed to analyze short social media posts, such as tweets. In addition, it runs very quickly, which is useful as I analyze thousands of tweets at a time on each of the two subjects.

I made two very minor changes to the VADER code. First, as reported in the “issues” on GitHub, some words are listed more than once in it’s dictionary, with differing sentiment values. For these, I used my best judgement on which to keep and discarded the duplicates. In addition, VADER lets the user add their own words to its dictionary of sentiments. Based on manual review of a large number of tweets, some additional words tailored to the subject matter were added and assigned sentiments:

  • “advances”: 1.2
  • “woot”: 1.8
  • “dystopia”: -2.5
  • “dystopian”: -2.5
  • “against”: -0.9
  • “disaster”: -2.5

Sentiment Metrics

Donut chart showing percentages of positive, negative, and neutral tweets
Donut Chart showing the current day’s sentiments by percentage.

Once the sentiment of each tweet is determined, the results are aggregated. The total numbers of negative, positive, and neutral counts are calculated and stored in an AWS S3 storage bucket. Using matplotlib’s pyplot, a donut graph is created showing the percentages for each type of sentiment, and the graph is also stored in an S3 bucket for display on the web pages.

Two indices are also calculated: the ratio of the number of positive to negative tweets, ignoring neutral tweets, and the index, which is the average value, with each negative tweets counting as -1, neutral tweets as 0, and positive tweets as +1. These scores are put into an HTML fragment and transferred to the web server for incorporation into the two sentiment indices web pages.

Word Cloud

Word Cloud showing the 100 most common 1 and 2 word combinations

In addition to calculating the sentiment indices, the 100 most common single and two-word combinations are determined for the current days tweets. A word cloud image is also generated. Both of these are done using the WordCloud library. Both the word list and the Word Cloud are written out to one of the S3 buckets.

Time Series Charts with Bokeh

After both of the sentiment analysis programs are run each day, a single additional program is run. This program has two functions, each of which operate separately on the AV and EV data. First, it produces two time series plots of the sentiment indices, using the data stored in one of the S3 buckets. Second, it uses the most common words stored in S3 bucket and compares them with the list that was stored in the bucket 7 days ago. Words that appear in the most common 100 for the current day, but not the day a week ago are determined (“hot” words), as well as those that were on the list 7 days ago, but not today (“not” words). A subset of the list is then formatted into an HTML table for display on the web pages.

Bokeh is used to generate the two time series plots. I used bokeh rather than matplotlib because I wanted some interactivity on the web page, and Bokeh is a Python library for creating interactive visualizations for web browsers. The actual web scripts that Bokeh generates are in javascript.

This was my first time using Bokeh. It wasn’t hard to figure out how to generate the basic time series plots that I wanted, or to add the “hover” tool to allow one to see the value of particular data points. However, I had some trouble figuring out how to generate the results for use on a web page and then incorporating them in the page. Bokeh plots can require a server, for complex interactivity involving changing data or plots, or can be embedded as standalone plots. For this application, I used it in standalone mode. There are four modes that can be used for this, and as a beginner, I freely confess I don’t understand them well.

The file_html method returns a complete HTML document that embeds the Bokeh documents. That wouldn’t be appropriate for this application, as the Bokeh plots are embedded within other web pages. The json_item method “returns a JSON block that can be used to embed standalone Bokeh content.” I played around with this a bit, but I don’t really understand it. The components method returns “Return HTML components to embed a Bokeh plot. The data for the plot is stored directly in the returned HTML.” I didn’t explore this one at all. Finally, the autoload_static method returns “JavaScript code and a script tag that can be used to embed Bokeh Plots. The data for the plot is stored directly in the returned JavaScript code.” This is what I used. The method returns a tuple consisting of JavaScript code to be saved and a <script> tag to load it. The <script> tab is placed at the appropriate location in your HTML.

There are several ways to embed the plots in your HTML. I used a server side include, and the section of code looks like this:

            <div class="bokeh">
            <!--#include file="AV_ratio.tag" -->
            </div>

You can see the two plots in the left column of the AV Sentiment Index web page. Static screenshots are shown below:

Time series plots generated using Bokeh

Initial Observations

Figure from Observing the Effect of a Crash on Twitter Sentiment: Early Results from Time Series Data, showing the correlation between news of a crash and a sharp rise in negative tweets regarding AVs

Shortly after I began to capture data, the sentiment regarding automated vehicles dipped sharply for about a week, with the number of negative tweets rising sharply and the positive to negative ratio dropping 2 standard deviations below the mean. In checking what was happening, news had just broken of a major multi-vehicle crash involving a vehicle in self-driving mode. This news was clearly reflected in the negative tweets, with “hot” words that frequently occurred, but had been missing the previous week such as “Bay Bridge,” “Tesla full,” “eight car,” “eight vehicle,” and “vehicle crash.”

Update, 4/23/2024: I’ve now completed this project by publishing two reports analyzing the sentiment data. For those interested, the reports are Seasonality and Variance of Twitter Sentiment Regarding Electric and Automated Vehicles, which looks at the seasonality, variance, long term trends, and skew in the sentiment data, and The Best of Days, the Worst of Days: Twitter Sentiment Regarding Automated Vehicles, which looks at a dozen outlier days, the causes of the extreme sentiment on those days, and the long term impact of these events.

Wireless Microphone Using Two Raspberry Pi’s (Updated 5/4/2021)

I’m in the process of integrating wireless audio input and  into my Yorick the Mimic project. This involves adding microphone input and wireless transmission into the sensor cap and then integrating the movement controller with a modified version of Chatter Pi that takes the transmitted audio as an input.

In order to get started, I first put together bare bones transmitter and receive programs. I’m using Python, along with PyAudio, which I also used in Chatter Pi, to process the audio on both ends. I’m using UDP to send the data packets contain the audio. I saw some examples using TCP, but it seemed to me that UDP was better suited to real-time audio. If anyone knows more on which is the better approach, please post a comment.

PyAudio

The code runs fine, but generates a continuing stream of

ALSA lib pcm.c:8424:(snd_pcm_recover) underrun occurred

warning messages. This doesn’t interfere with the program’s operations, but if anyone knows why I’m getting them and/or how to eliminate the warnings, I’d appreciate your letting me know.

PyAudio has two modes, a blocking mode, where each call to pyaudio.Stream.write() or pyaudio.Stream.read() blocks until all the given/requested frames have been played/recorded and a non-blocking mode where a callback function is launched in a separate thread, so that processing can continue in the program calling it, and the thread ends when the current chunk of audio is processed. The gist with my code uses the non-blocking mode using the callback function and two different versions of the receiver, one using blocking mode and one using non-blocking. You need to be careful when using the non-blocking mode that the callback function does not include anything really time consuming, like file reading and writing. If it does, it can’t finish before the next chunk of audio is ready and you get clipping or worse problems.

As is well-known, the audio jack output on a Pi produces low volume, poor quality audio. A USB speaker works much better. However at least for the speaker I’m using, I need to use the ALSAMixer to control the volume. If I touch the speaker icon in the GUI, it produces no sound if set to anything other than the maximum volume. Again, if anyone knows why and how to fix this, please add a comment.

By design, both the xmit and rcv programs run forever once started. There’s one other feature beyond the bare bones basics. The receive program has a Boolean variable named EFFECTS. If set to True, the Sox library is used to deepen the pitch and add a bit of reverb before sending the audio to the speaker.

Hopefully this project will help others with similar needs.

Yorick Becomes a Mimic: Wireless 3-Axis Skull Control via Motion Capture

Introduction

Picture showing me wearing the sensor hat, with the 3-axis skull next to me.

The sensor cap and the skull it wirelessly controls.

A few years back I presented my Yorick Alexa project at a local Hack & Tell meeting. At the meeting, someone asked if I could hook up a microphone to provide the voice, rather than Alexa. That is a pretty trivial change (much easier than getting Alexa interfaced in the first place), but from that I had the idea of capturing a person’s head motions for the skull in real-time, rather than using pre-canned, repeated routines. This Mimic project was the result.

This project uses two Raspberry Pi’s, communicating over WiFi using XMLRPC, along with an Inertial Measurement Unit (IMU) sensor and a Maestro Servo Controller to have a 3-axis skull wirelessly mimic the head movements of the operator, who is wearing a baseball cap with the sensor Pi and IMU unit mounted on the brim. (A shoutout to Greg G on Haunt Forum for the ball cap mount idea).

Side note: My first job out of college was as a systems engineer on the Shuttle Mission Simulator, including the simulation code for the rate gyro assemblies and the on-board accelerometers. There, I first heard of this weird thing called a “quaternion.” Now, over 40 years later, here I am using them in a hobby project!

Overview

This project has two units, a sensor unit and a controller unit. The sensor unit consists of a Raspberry Pi Zero W and an Adafruit BNO055 9 degree of free

Sensor unit mounted on a baseball cap

Sensor unit mounted on a baseball cap

dom IMU Board. The controller unit consists of another Pi Zero W and a Pololu Maestro Servo Controller.

The project has two units: the sensor unit and the controller unit. The Sensor unit consists of a Raspberry Pi Zero W and an Adafruit BNO055 9-Degrees of Freedom IMU board. The controller unit consists of another Pi Zero W and a Pololu Maestro Servo Controller. The two communicate with each other using XMLRPC running over WiFi. The Sensor unit acts as the server and the Controller unit acts as the client.

I mounted the sensor Pi and IMU board on the brim of a baseball cap, so it will track my head movements.

The code for the project is open source, of course, and posted on GitHub.

How it Works

The IMU board has a triaxial accelerometer, a triaxial gyroscope, and a triaxial magnetometer.  The magnetometer provides orientation relative to magnetic north and is not used in the IMU only mode used in this project, as only relative orientation to the starting position (which should be looking straight forward and level) is desired. The accelerometers and gyroscope. The secret sauce in this board is that it includes a  high speed ARM Cortex-M0 based processor that takes in the raw data from all the sensors, fuses it, and outputs the calculated orientation in real-time.

The orientation is provided as Euler angles (think roll, pitch, yaw rotations about the board’s x, y, and z axes), as the x, y, and z components of the gravitational force vector, or as a quaternion (which defines the direction of a single rotation axis and the amount of rotation about that axis).  Unfortunately the chip has some flaws in calculating Euler angles, so it’s safer to use the quaternion output. For this project, the aviation convention is used for angles (x axis looking forward, y axis pointing right, and z axis down). Note however, that some of the servos move in the opposite direction, so you’ll see some sign changes in the function that converts the angles to actual servo commands).

For moving the skull, we want the Euler angles, though, to command the tilt, nod, and turn servos of the skull. Fortunately it’s easy to find the math to implement to convert the quaternion value provided into the proper Euler angles.

In this project, the sensor Pi is set up to be an XMLRPC server which, when receiving the appropriate XMLRPC request, queries the board for the current orientation, provided as a quaternion, and wirelessly sends this to the controller Pi that made the request.

The controller Pi makes this request 50 times per second. Once it has the quaternion, it converts it to the three Euler angles, and then converts each angle into the appropriate servo command to pass on to the maestro.py module that communicates with the Maestro servo controller. In addition, there is a subroutine that sends commands to move the eyes servo in a pre-determined, but relatively random, fashion.

This video [[[[[insert] shows the system in action. The goal is not to get perfectly synchronized motion between the operator and the skull, as the servos will always introduce significant delay. Rather, the intent is to generate spontaneous and realistic movements by capturing the motion of the operator, who normally will be out of sight.

Software Prerequisites

The sensor software uses CircuitPython and Adafruit’s Blinka library must be installed to support CircuitPython on a Pi. In addition, the adafruit_bno055 program is needed to interface with the board. Be sure to use the CircuitPython version rather than the earlier version.

Hardware

Picture of BNO055 IMU board connected to Pi Zero W

Closeup of the BNO055 sensor and Pi Zero sensor unit

This project requires two Raspberry Pi’s with WiFi (I used Pi Zero W’s), an Adafruit BNO055 IMU board, and a Pololu Maestro Servo Controller to control the motions of a 3-axis skull. You also need some jumper wires and a USB OTG cable. The OTG cable is used to connect the controller Pi to the Maestro Servo Controller.

Sensor

The sensor unit captures the operator’s head movements. The sensor software is just one program, imu_rpc_server.py. It is configured to use a UART interface to the BNO055 board. One can use an i2c interface instead by changing a few line at the top of the program, however all Raspberry Pi’s have a hardware issue with i2c clock stretching, and the sensor board uses clock stretching. This gave extremely unreliable results when I was developing this software and I put the entire project aside for months until I came back and found out about the hardware issue. There is a workaround that you can use if you want to use i2c that consists of slowing the speed down so that clock stretching will rarely (hopefully never) be needed. You can read more about the issue here: https://www.mcgurrin.info/robots/723/.

Normally when the Pi kernel boots up it will put a login terminal on the serial port. You’ll need to turn this off if using the UART interface. To do so, you can run the raspi-config tool and go to Interface Options, then to Serial Port, and disable shell messages on the serial connection. Then reboot the Pi. If you later need to re-enable it, just follow the same procedure. To wire up the sensor board and the Pi using the UART interface:

  • Connect BNO055 Vin to Raspberry Pi 3.3V power
  • Connect BNO055 GND to Raspbery Pi ground
  • Connect BNO055 SDA (now UART TX) to Raspberry Pi RXD pin
  • Connect BNO055 SCL (now UART RX) to Raspberry Pi TXD pin
  • Connect BNO055 PS1 to BNO055 Vin / Raspberry Pi 3.3V power

You can test that everything is working by running the simpletestuart.py program. It will print out the temperature of the board, the individual sensor parameters, and the integrated Euler angle and Quaternion values, as well as the calibration status and the axis map. This will update every 5 seconds. Don’t worry that the magnetometer readings will be None. This project uses relative orientation and therefore the software turns off the magnetometer. For more information on the board and the various outputs and settings, see the datasheet.

The imu_rpc_server.py program is similar to the test program, but it only reads the quaternion values (the Euler angles are unreliable on this sensor). It also starts and XMLRPC server that will respond to read_sensor requests by returning the quaternion value for the current orientation. The program, once started runs continually until forced closed.

Controller

The controller software runs on a different Pi, and consists of two programs: main_controller.py and maestro.py. Main_controller.py initializes the servo controller and sets limits on the speed and acceleration for the servos. I found this cuts down on the noise of the servos, provides smoother motion, and keeps the head from whipping around when turning.

The board queries the sensor Pi via XMLRPC ten times per second to get the position as a quaternion. It converts the quaternion into Euler angles and converts the Euler angles into servo commands and sends the servo commands to the Maestro board. It also generates a sequence of eye movements and sends that to the Maestro as well.

ManServoTest can be used as you set things up to make sure your controller Pi is talking to the Maestro correctly.

Use

Set up the Pi’s to connect to your local WiFi network and install CircuitPython on the sensor Pi (it’s not needed for the controller). Then install the appropriate modules on each one. Connect the sensor Pi to the BNO055 and connect the controller Pi to the Maestro Servo Controller using a USB OTG cable. And, of course, connect the appropriate pins on the servo controller to the servos controlling your skull. Roll or tilt is channel 0, pitch or nod is channel 1, yaw or pan is channel 2, and random eye movements are sent out on channel 3.

Once everything is hooked up, you must start imu_rpc_server.py on the sensor pi first, so that it starts the XMLRPC server. Then launch main_controller.py on the controller Pi and you’re off and running.

Opportunities for Expansion

It would be very straightforward to take the sensor software and use it to capture and record head motions to a file. Then a different program could read back the file and send out the previously captured motion commands.

A somewhat more complex undertaking would be to capture the motions as described above, but then translate them into the scripting language used on the Maestro servocontroller, so that they could be played back without a Pi or other computer. Sending out 50 commands per second, however, would quickly fill the limited memory of the Maestro. Instead, one would want to pre-process the motion file so that only key commands are kept (analogous to keyframe animation) and only those key commands translated to the Maestro’s scripting language.