PROJECTS


  1. Proposed a novel technique for detecting and analyzing humor in comedy television show FRIENDS. This research was published in one of the top international conferences in Natural Language Processing area (EMNLP) held in Sydney, Australia in 2006. The paper analyzed dialog transcripts and audio recordings in FRIENDS TV-show for automatic humor detection.

  2. Implemented Text Mining and Information Extraction algorithms for automatically building a large-scale relational database for movie actors and pop-singers by mining web pages and biographies on Wikipedia. This project was carried out at SONY Corporation, Japan during summer 2008 internship program.


  3. Explored data mining techniques to automatically categorize product items based on the similarity of their features and product descriptions. This project was done at Amazon.com, Seattle during summer 2005 internship program. The project utilized unsupervised clustering algorithms to automatically build a product taxonomy by grouping similar products.

  4. Developed classification algorithms for predicting the email reply order for Automatic Email Prioritization. The project analyzed user behavior and inter-personal relationships among users, along with the features extracted from emails to predict the email-reply order for prioritization.

  5. Proposed a machine learning framework for analyzing coherence in spoken conversations. The algorithm tries to distinguish random incoherent conversations from natural coherent dialogs with over 85% accuracy. This research was carried out at University of Pittsburgh and published in FLAIRS 2008 conference in Florida.

  6. Implemented language semantics demo for analyzing similarities and relations between entities from natural language texts. This project was done at the Information Sciences Institute (ISI) of University of Southern California (USC) during summer 2007 internship.

  7. Worked with collaborators at Mayo Clinic in Rochester, Minnesota on bioinformatics project to develop unsupervised learning methods for resolving ambiguities in biomedical texts.

  8. Developed an open source software package SenseClusters for Unsupervised Word Sense Discrimination task. This project was funded by the National Science Foundation (NSF) research grant and was completed as part of the Master’s Thesis at University of Minnesota.

  9. Co-organized the SENSEVAL-3 English-Hindi Lexical Sample Task by collecting transliterations for Hindi words and their dictionary definitions. This shared task is for evaluating multilingual word sense disambiguation systems.

  10. Won 2nd Prize for Best Paper in Artificial Intelligence & Fuzzy Logic in technical paper presentation competition organized by Pune Institute of Computer Technology (PICT), India in association with IEEE. The paper presented a prototype model for a language understanding system using parsing and logical inference.

  11. Worked on a Contextual Advertising project for a start-up company, Social Extract. The project analyzes text content on Twitter to identify twitter users and relevant tweets for advertising and marketing campaigns.

  12. Automatically generated skill-profiles for technical support engineers by mining unstructured text-content from email communications. The algorithm retrieves a ranked list of experts in the given skill-areas.

  13. Built a demo prototype for a client that works in medical and health-care domain by analyzing content posted on social media. The project identifies trending topics, user activities, as well as, the sentiments in user posts, to study the effects on product sales and revenue.

  14. Performed predictive analysis for a client that offers bike rental service in the Bay Area, by computing correlations between bikes rented and weather parameters like temperature, wind speed, humidity etc. on a given day.

  15. Developed forecasting models for predicting stock prices for Fortune 500 companies based on the historical data for past 10 years using Time-Series Analysis.

  16. Implemented brand-clustering algorithm to identify companies that work in the same industry sectors or offer similar products and services, by extracting features (related words) from online news text using word2vec utility.

  17. Mentored a team of college interns from College of Engineering, Pune (COEP) on “Box-Office Predictions” project, to estimate the box-office revenue for movies, based on parameters like reviews & ratings, production budget, studio and genres.

  18. Mentored a team of college students for Digital Pune Hackathon event. The students built dashboards to visualize electricity usage and consumption for load-balancing, and to forecast demand based on weather conditions like temperature and humidity.

  19. Built a destination-search engine by mining articles on Wiki-Travel, to suggest top domestic and international cities for a selected activity (e.g. scuba diving, ice skating, cross-country skiing, trekking, horse riding etc) or theme (e.g. wild life safari, hill station, beach resort, amusement parks, world heritage sites).

  20. Performed sentiment analysis on text-snippets (phrases) extracted from hotel reviews to score and rank hotels on dimensions like location, quality of food, service, cleanliness and amenities.

  21. Built a hotel-search engine by mining reviews from Trip Advisor. Hotels can be searched near specific points-of-interests (POIs) like metro station, airport, popular landmarks, local attractions and neighborhoods, or by specialty services like Italian Dining, Ayurvedic Spa, Infinity Pool, Private Beach etc.

  22. Trained word2vec model to automatically discover concepts from text by identifying semantically similar and related terms. Created visualizations by projecting word vectors in 3D space using Embedding Projector visualization tool from Google.

  23. Created a culinary-search engine in AWS CloudSearch to search thousands of online recipes based on the given ingredients (e.g. potato, spinach, mushroom), and/or category (e.g. breakfast, barbecue, vegetarian). Sample queries include: “apple pie”, “carrot cake”, “potato salad”, “mushroom soup”..