Sat.Oct 04, 2014 - Fri.Oct 10, 2014

article thumbnail

Moving Beyond CTR: Better Recommendations Through Human Evaluation

Edwin Chen

Imagine you're building a recommendation algorithm for your new online site. How do you measure its quality, to make sure that it's sending users relevant and personalized content? Click-through rate may be your initial hopeā€¦but after a bit of thought, it's not clear that it's the best metric after all. Take Google's search engine. In many cases, improving the quality of search results will decrease CTR!

Metrics 79
article thumbnail

Greek Media Monitoring Kaggle competition: My approach

Data Science and Beyond

A few months ago I participated in the Kaggle Greek Media Monitoring competition. The goal of the competition was doing multilabel classification of texts scanned from Greek print media. Despite not having much time due to travelling and other commitments, I managed to finish 6th (out of 120 teams). This post describes my approach to the problem. Data & evaluation The data consists of articles scanned from Greek print media in May-September 2013.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Announcing Nutanix Metro Availability

Nutanix

Thereā€™s no question that enterprise application usability has for the most part been left behind.

article thumbnail

Announcing.NEXT 2015, the Web-Scale Conference

Nutanix

Thereā€™s no question that enterprise application usability has for the most part been left behind.

article thumbnail

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Speaker: Aarushi Kansal, AI Leader & Author and Tony Karrer, Founder & CTO at Aggregage

Software leaders who are building applications based on Large Language Models (LLMs) often find it a challenge to achieve reliability. Itā€™s no surprise given the non-deterministic nature of LLMs. To effectively create reliable LLM-based (often with RAG) applications, extensive testing and evaluation processes are crucial. This often ends up involving meticulous adjustments to prompts.