We’ve got a great way to save you money on music testing: Test your music with one listener. Just one. OK, maybe two.
Concerned about the reliability of a one person music test? Then have that one person rate each song twice, better yet three times.
Or maybe one hundred times.
Why spend the money testing music with a hundred people, when you can end up with the same sample size having a handful of people rate each song dozens of times?
Sounds pretty silly doesn’t it?
It’s obvious that having one hundred different people rate your music is very different from having the same people rate your music over and over.
Yet, that is what some radio stations are doing.
Stations are letting just a handful of listeners determine what songs they play.
It works like this: Using minute by minute PPM data synched with the station’s music log, you can watch how your meter count changes as different songs come on.
The belief is that if the number of meters goes up, you may have a hit on your hands.
If the number of meters goes down, then maybe you’ve got a stiff.
And if you see a pattern repeated over time, like a decline in meters most every time you play a song, you can bet the song’s a stiff. Might as well dump it.
Sounds pretty powerful, doesn’t it? After all, these are actual PPM panelists essentially participating in a real-time music test.
What could go wrong?
Let’s look at the potential problems of using PPM meter flow to pick the hits.
First, PPM does not measure listening. It measures exposure.
The difference may seem trivial, but it raises profound questions about the meter’s ability to measure listener intent.
The meter can’t tell you whether the panelist is listening to the station or is simply in the vicinity of someone who is.
More troubling is the fact that repeated Arbitron analyses have shown that when panelists leave a station, they rarely tune to another. They simply disappear.
Maybe the radio station was switched off, or maybe the meter simply lost the signal because of noise and interference, or the panelist tucked the meter away.
So we can’t assume that what the station was doing at the time had anything to do with a change in the meter count, up or down.
There’s also research that shows that listening spans for virtually every format are nearly identical. Formats don’t seem to matter. Rock, CHR, Country listeners all listen for ten minutes at a time.
That suggests that meter flows are driven more by meter technical issues than listener behavior.
These issues are enough to raise serious questions about whether PPM is capable of measuring listener likes and dislikes.We also can’t forget that PPM meter counts for most stations are abysmally low.
It might seem that just trending meter counts over time could overcome the inadequacy of sample, but that defense overlooks one important point about PPM.
In the diary world participants rotate out every week. New people replace last week's participants, so the independent sample increases over time.
That's not the case with PPM. Nielsen minimizes turnover with the same panelists carrying meters week after week for years.
Over time the apparent in-tab for a song’s score may grow, but the majority of panelists are the same people.
We’re back to our one person music test with the same panelists rating a song over and over.
If that’s not enough to make you skeptical, let’s look at the meters themselves.
The 1980s technology behind the meter relies on an analog system to separate the encoded signal from the broadcast audio, ambient noise, and interference. Research has shown that PPM meters miss at least 30% of the listening that it ought to recognize.
So there’s drop-out.
Because of the drop-out, Nielsen computers use a series of editing rules to fill in the gaps. For example, you can get up to three minutes of credit for a period of time when the meter can’t ID the station if the meter later finds your station.
That’s just one of several editing rules that fill in the gaps created when the meters get confused.
As a result of the gaps and editing rules that fill them in, minute by minute is really not as precise as it might appear.
Maybe your meter count increased because of editing, and the panelist was listening all the time. Maybe the count declined because the meter could no long confirm that the panelist was still listening. Maybe the panelist wasn’t even listening to the station.
All are possible, and while these sorts of problems tend to average out over weeks and months, it makes drawing conclusions about what is happening with meters every minute questionable.
Any one of these problems ought to disqualify PPM as a tool for testing content. Taken as a whole, it suggests that randomly choosing which songs to play is as accurate as relying on meter flows.
If you really care about playing songs that your listeners want to hear, the only reliable way to do it is to ask them.