Archive for November, 2007
Description of the Data:
A comprehensive analysis of what types of products are being shipped to locations all over the United States. The data could be grouped by any kind of location: State, county, or zip code to just name a few.
Why this data is interesting:
To see what types of products are going where in the country would be a great guide for businesses. For instance, if there was a trend of home theatre electronics being delivered to a certain zip code in Ohio, it might be divined that someone could make a killing by opening up a home theatre store. A benefit to both the consumers, who now have a nearby location to purchase such things, and the business, that is now reaping the rewards.
How we can obtain the data:
Realistically, we can’t. However if some of the larger sites were willing to sell information based around the type of product sold and the zip code that it was shipped to, it would be possible to analyze the data in many meaningful ways.
Some might find it a breach of their privacy to have large corporations selling their private data of what they were buying. I’m certain these people will soon be quieted once they are witness to the fantastic savings and convenience that will come of it.
I wouldn’t think you’d have to worry about many companies signing on. If they would agree sites like buy.com and amazon.com would cover a good portion of the online shopping world. Meanwhile, digital version of real stores (target.com, bestbuy.com, etc…) would probably be happy to lend the data if it could be compiled with other sites that would then help them plan where to put their next store.
What specific questions can the data answer:
- What types of things are popular where?
- What part of the country buys the most of what?
- Where the best places would be for new specialty stores?
3,033 Comments »

Description of the Data
Founded in February 2005, YouTube is the leader in online video, and the premier destination to watch and share original videos worldwide through a Web experience. YouTube allows people to easily upload and share video clips on www.YouTube.com and across the Internet through websites, mobile devices, blogs, and email.
Why the data is interesting?
Everyone can watch videos on YouTube. People can see first-hand accounts of current events, find videos about their hobbies and interests, and discover the quirky and unusual. As more people capture special moments on video, YouTube is empowering them to become the broadcasters of tomorrow.
YouTube has struck numerous partnership deals with content providers such as CBS, BBC, Universal Music Group, Sony Music Group, Warner Music Group, NBA, The Sundance Channel and many more.
How we can obtain the data?
The data can be obtained by typing in specifically what you want to find in this website in the search box located on the top of the page or click on the tabs, also on the top of the page. You can search from the “most played video” to the “news & politics” category to the channels from popular music artists or even from everyday people who want their shot at stardom through the world wide web. You can also look at the most recent and most popular videos on the site.
We can also take a look at a user’s popularity on youtube with the use of vidstats.com and also by looking http://www.micropersuasion.com/2006/08/youtube_by_the_.html and http://www.churchcommunicationspro.com/2007/03/20/some-enlightening-internet-video-statistics-youtube/ for statistics of the site.
What specific questions the data will/can answer?
- How many registered users are there?
- How many people are on this site right now?
- In what part of the world makes up the most YouTube users?
- Can I watch some of my favorite TV shows? Music videos from my favorite artists? Movies?
- What’s the most subscribed channel?
- What’s the most watched video of all time?
- What are people watching right now?
- Is there a track history of the videos I’ve watched from YouTube?
- Can I make my own personal channel and upload videos of myself and my everyday so-called teen angst life? If so, are there going to be a lot of people watching my senseless acts? And can I get famous for doing so?
-Georgia Cruz
3,179 Comments »
Mike Beideman & Emily Capkanis
The Target Audience of the Most Popular Websites
The internet is a very popular tool especially for youth in our nation. People use it for shopping, entertainment, and research. There is a lot of potential for advertising on the sites as well as search results.
- Why the data is interesting
Websites operators want to know how to improve the user interfaces of their sites. Knowing the greatest number of users could help to improve content as well as advertising. Advertisers and website operators can improve business based on this data.
- How we can obtain the data
The first step is to identify the most popular domains on the web. This can be easily achieved by looking at already gathered data regarding popularity of websites. http://seekingalpha.com/article/25309-the-20-most-popular-websites
http://blog.compete.com/2007/10/30/top-50-websites-domains-digg-youtube-flickr-facebook/
The next step is to analyze the audience for each of these top sites and note specific information such as age, location, and possibly other interests. http://www.cbc.ca/technology/story/2006/10/06/tech-myspace.html - this link contains data regarding the ages of myspace.com users. This information is useful in determining the audience for the site. Information regarding the users of other sites may be found using simple searches; however, additional surveys of users may be needed to find all of the data we’re looking for.
-
-
- What specific questions the data will/can answer?
- Which websites are preferred under the age of 18?
- Which websites have a fan base of 21+ (advertising for alcohol)
- What region are users from? (for local advertising)
- Do the actual users match with the sites intended target audience
4,086 Comments »

1. Description of the data:
- TicketMaster (www.ticketmaster.com)
- A nation-wide ticketing agency where you can purchase tickets to see artists, teams, other venues and more. They also have more operations around the world in specific other countries.
2. Why the data is interesting:
- Americans love to go to concerts and sporting events and that is why ticketmaster is so interesting because it gives you full access to information about events and venues that are important to Americans, and others worldwide also.
- Ticketmaster.com is also known for having descent ticket prices compared to what the actual event privately would charge or other ticketing sites.
- The data is also interesting because you can search an artist, for example, and see the different locations where they will be playing and once you select a location you can view a map of the seating to specifically pick certain seats that you know you will be happy with. Not all ticketing sites let you see the seating chart/map.
- How we can obtain the data:
- The data can be obtained by going to ticketmaster.com. As soon as you retrieve the site, there is a box in the top left corner of the interface that allows you to search for an artist, team, band, or a specific event and more. Now you have easy access to where your favorite artists’ and more are performing!
- Overall, one can find and buy tickets at ticketmaster.com for the following: concerts, sports, arts, comedians, theater, theatre, broadway shows and family events.
- Here is an example: I went to the site typed “dave matthews band” in the search box. It gave me 3 results, for example, one was “Izod Center, East Rutherford, NJ, Dave Matthews Band - Tuesday 11/13/07 @ 7:00PM. So, now I know when one of my favorite bands is going to be playing next and it happens to be this month here in New Jersey. I am a pleased ticketmaster.com customer and the prices for the seats weren’t bad at all!
- What specific questions the data will/can answer: The data can answer several of questions, here are some common ones I thought of:
- Is my favorite band playing in New Jersey this month?
- Can I sell tickets to other fans via ticketmaster.com?
- Does ticketmaster.com provide me with entertainment guides, such as museum guides or Broadway guides?
- Does ticketmaster.com offer sports packages?
- Can I create a personal account with ticketmaster?
- What do I do if I can’t attend the event I purchased tickets for? Are they returnable or can I sell them ahead of time or can I switch them with a different even of equal value?
- What’s the lowest ticket price for a Bon Jovi concert? Will the seats still be good even though the tickets are cheap? Is there a seating chart of the location I can look at?
3,342 Comments »

Instant-win prize data for NJ Lottery games, specifically numbers of large prizes remaining and originally offered. Other data available includes date contest began, total number of tickets initially produced, cost of each ticket. Available at this link: scratchoff.xls is a sample Excel file to show how the data should be entered. It shouldn’t take more than a couple of hours to collect all this data.
- Why the data is interesting
The lottery is popular, and people probably want to maximize their chances of winning large prizes. We can offer more than other lottery-related sites offer. There are no easy-to-find web sites comparing the odds of winning the different instant-win games.
- How we can obtain the data
From the New Jersey Lottery web site page on instant-win games (http://www.state.nj.us/lottery/instant/2-1_unclaimed_prizes.htm). Off that page there is a paragraph for each instant-win game that looks like this:
In the “SUPER CROSSWORD” Instant Game,
New Jersey allocates 65% of the gross receipts to prizes. On the average, better than 1 ticket in 5 wins a prize. In a game of 3,900,000 tickets there are 390,000 prizes of $5; 273,000 prizes of $7; 92,300 prizes of $10; 65,325 prizes of $12; 18,200 prizes of $15; 19,500 prizes of $50; 26,000 prizes of $100; 19,500 prizes of $150; 6 prizes of $750; 4 prizes of $7,500 and 6 prizes of $50,000. Odds and number of winners may vary based on sales, distribution and claims.
We can retrieve this paragraph for each game automatically, because the URLs are predictable (http://www.state.nj.us/lottery/instant/ig738.htm, http://www.state.nj.us/lottery/instant/ig739.htm, etc.) And we can put this together with the count of unclaimed large prizes on the main instant-win page, which contains information like this:
$50,000 - 6
$7,500 - 8
$750 - 11
- What specific questions the data will/can answer. Some of these need some thought about how they should be calculated. It would be good to start with two or three proposals for that.
- Which games have the best odds of winning a large prize? (How do we calculate the odds for one game?)
- Provide all the information about instant-win games (particularly large prizes remaining as a fraction of large prizes initially offered) on one page. (This shouldn’t be hard from the Excel spreadsheet)
- Provide a service to cell phone users to send a query from a lottery retailer to say which of the available games has the best odds of winning a large prize. If possible, do this via picture-messaging, so the cell phone user can just send a picture of the scratch-off display case to our service and have it analyzed. (We probably have no hope of doing this for various reasons - insufficient technical expertise, it involves image recognition, too hard to test, etc., etc. Maybe we could make a static web page updated every couple of days with the information instead - ideally, one that would display nicely on a cell phone or other small device.)
4,988 Comments »
1. For Wednesday, November 7 (but some submissions by Monday would be welcome), post (individually or in groups of 2) a data project description containing these parts:
- Description of the data
- Why the data is interesting
- How we can obtain the data
- What specific questions the data will/can answer
Between Wednesday and Friday, revise as desired
2. [Evaluation] (process to happen after Friday, Nov. 9, in groups of at least three, each group evaluating five or six projects*)
Requirement for evaluation teams: For each project description you evaluate, write about a page with about a paragraph for each of the categories below. If you want, give feedback for the proposers also, or ask questions as you evaluate these and invite proposers to revise their projects. Then write a summary page comparing the projects to one another, ranking them if possible on 1) the C’s criteria and (separately) 2) feasibility.
- Clarity
- Goal: the data should be described sufficiently well so that someone in this class could go with that data description and get the data in a “hands-on” form (in an Excel file, written on paper, or other electronic or written format that is organized and ready for processing to answer the questions).
- Correctness
- Is the data available from sources as described?
- Can the data answer the questions listed?
- Coolness
- Is the description creative, catchy, compelling?
- Is there some value to this project in terms of originality, larger purpose, building on existing knowledge, etc.? (innovative, interesting, practical, reliable)
- [not for grade, but to be evaluated] Feasibility
- What kind of time frame and resources would this project take to complete?
- Do we have the skills and access to information and tools needed?
- Would you want to complete this project yourself (with some help)?
*See November 9 post for evaluation groups and projects to evaluate.
3. Even later, we will execute at least a couple of these projects.
4,272 Comments »
|