Dream Come True: Building websites by thinking about them

From the mind to the computer, make websites using your imagination!



By Sreeram Ajay, Anjali Agarwal & Vatsala Nema

Brain-Computer Interfaces have become one of the most promising technologies for assisting people who have lost their motor control due to stroke, spinal cord injury, paralysis, or ALS, to communicate and perform actions that were not feasible earlier. Using this technology, they can now type, walk and grasp things just by imagining writing the same using their own hands. Leveraging this technology, we present here our Neural Website, which we presented as part of AllenNLP Hacks 2021. In this project, we used BCI and NLP techniques to enable people who have lost their motor control to create websites on their own just by imagining to write the instructions. 

The project has three components,

  1. Decoding imagined handwriting
  2. Natural language processing
  3. Website Builder

Image
 

Let’s dive further into each part to understand the project.

 

Decoding Imagined Handwriting

 
The first part of the project consists of converting the neural signals associated with handwriting into text in real-time. Researchers at Stanford University implemented this using the recorded neural signals from the hand “knob” area of the participant’s left-hemisphere (dominant) precentral gyrus implanted with intracortical array electrodes. 

Figure
Source: [1]Stroke Chameleons Manifesting as Distinct Radial Neuropathies: Expertise Can Hasten the Diagnosis 
[2]Wikipedia

After preprocessing the neural signals, the multiunit threshold crossing rates(the average number of times per second that the neural signal crosses a certain threshold)  were used as neural features. These neural features were fed into a Recurrent Neural Network model, which converts the neural time-series signal into probability time series. The probabilities, in turn, tell us about new character beginnings and the likelihood of the subsequent characters. (as shown in the figure below) In the online mode, the raw output gives each character when its probability crosses the threshold. In the offline mode, the character probabilities are combined with a large-vocabulary language model to decode the most likely sentence. 

Figure
Source: https://www.nature.com/articles/s41586-021-03506-2

This BCI communication system gave a typing speed of 90 characters per minute, more than the previous typing record using cursor movements.  The word error rate is 5.4% in the online mode, while using language models in the offline mode gave a word error rate of 3.4%. 

Due to the unavailability of resources to collect original data for building the website, we created synthetic neural signal data that can be used to demonstrate our project. (The same technique that is used for data augmentation during training in the original study.)

 

Natural Language Processing

 
After obtaining the decoded text from the neural signals, we used Natural Language Processing to clean and process the text to decipher the participant’s instructions on the website. We cleaned the decoded text using autocorrect techniques similar to those used in speech recognition, like the bigram model to get the candidate sentences and a language model to rescore the outputs. For extracting the relevant information from the cleanup text, we used NLP techniques. This technique classifies the intent and entities in the text that the participant wrote. 

 

Intent Recognition

 
To find the intent, i.e, the action the user wants to take, we first define a set of known intents that can be used to search for similar intent in a new sentence. We converted these raw text examples into sentence embeddings using the BERT model to get the vector representations of each example sentence. For new sentences, we used the same trained model to get the corresponding embeddings. To find the sentence most similar to the text entered, we used FAISS (Facebook AI Similarity Search) library’s IndexFlatIP algorithm, which uses the inner product between the vectors to find their similarity. 

 

Entity Mapping

 
On obtaining the intent, we need to find the entity or slot in the sentence. Entities can be fields, data, or text descriptions (in our case, it is HTML elements). We used ELMo model to obtain the contextual word embeddings. Contextual word embeddings contain information about the individual word and the contextual meaning of its usage in the sentence. For example, in the sentence “I left my phone on the left side of the room” the word “left” can have different meanings according to its occurrence in the sentence. 

We also defined some entities to look for in the sentence like image entity, position entity, etc. The various entities in the sentence were decoded using the similarity search algorithm. The model also gives the value of the entity decoded. We used BIO (Beginning Inside Outside) Tagging, (a common tagging format for tagging tokens) to get multi-word entities.

For example, for a new sentence, “Change background to elephants near the pond”, the model will output the result as,

Intent: Change
Entity: Background Image
Value: elephants near the pond
 
 

Website Builder

 
We finally reached the final phase of the project where we can build the main website after collecting and processing all the relevant information from the raw input given by the user.  

Now the extracted intent and entities are used to update the user interface. An intent represents action to take (add, remove, change, increase). Entities represent elements (title, image, border) and properties (size, color, style, position, image description). Like from the above example, the website builder will search for images of elephants in the image search API (In This case, pixabay API) and add the top result as the background image to the current view. If the user doesn't like it, they can input the next image/ change image, etc. Then the background will be replaced by the second top result. Similarly, users can continue with more such commands like next section/ add a slide and keep adding content to enhance and customize the page to their liking. Finally, they can save, host, and share their creation as a link.

Below is an example of a user view while using the tool:

We plan to extend our project further to:

  • Interpret natural thoughts and connect with more natural language interfaces, building better and more robust design and developer tools.
  • Connect with the Internet of things, mainly voice assistants like Alexa.
  • Generating more specific images for editing the website in the form of textual descriptions with the help of language models like DALL.E. (creating images from text)

Through this project, we hope to empower differently-abled people facing adversities alluding to but not limited to ALS, Stroke, etc., in creating their own websites, presentations, or anything necessary to showcase their skills, activities, or even market their entrepreneurial venture suitably. 

You can view the demo and presentation of our project here:


Neural Website - AllenNLP Hacks 2021 - YouTube.

 

References

  • Willett, F.R., Avansino, D.T., Hochberg, L.R. et al. High-performance brain-to-text communication via handwriting. Nature 593, 249–254 (2021). https://doi.org/10.1038/s41586-021-03506-2
  • Reimers, Nils and Iryna Gurevych. “Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks.” ArXiv abs/1908.10084 (2019): n. Pag.
  • Peters, Matthew E., Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. “Deep Contextualized Word Representations.” NAACL (2018).

Ajay Sreeram, Anjali Agarwal &  Vatsala Nema
Team Dream Come True

 
Sreeram Ajay is working as Senior NLP Engineer at Target.

Anjali Agarwal is working as a Researcher in Tata Research, Development and Design Center.

Vatsala Nema is an EECS Junior at Indian Institute of Science Education and Research, Bhopal.

Related: