Big Data Challenge

Competition Date:
March 12-13, 2020

Big Data Challenge Rules
Big Data Challenge Rules *Revised 1/27/20*

Sample Tweets:
Download *Updated* Sample Tweets Here

Competition Tweets:
Submission procedures and tweet information will be distributed at check in.
Competition Dataset


Q: In the list of sample tweet’s provided, there are no sample coordinates provided, only text, disruption_type, and disruption_status. Will we be given a csv file that more closely resembles the competitions csv file as described in the competitions guidelines?
A: For the sample tweets, coordinates weren’t provided. At the SESC, teams will be given a csv file containing the text from the tweets and the longitude and latitude of each tweet. The team must use the the data in the text column to produce the disruption type and disruption status of each tweet. The coordinates will allow your team the opportunity to earn bonus points towards their score by completing the visualization portion described in Section 6 of the competition rules. 

Q: When creating a map based on the coordinates provided by the tweets, should this be algorithmically generated or is this something that needs to be human made? What is the scale of the map, should we make it a map of the United States, the continent, the world?
A: When creating the map for the visualization portion, it is up to your team to decide the best way to generate the disruption map. The map should be large enough to show the locations of ALL the disruptions. 

Q: In the competition’s procedure, it states that if a disruption is found, then the type of disruption needs to be determined. However, in the sample csv file given, there are times when the disruption_status is labeled as 0, yet there is a disruption_type given. Do we still need to label the type of disruption it would be, even if it’s not one?
A: Yes. For example, the sample tweet in line 15 states “Power is back, wifi is working, AC on blast….back to being lazy and staying indoors ON MY OWN TERMS”. This statement is labeled as 1 (power), 0 (not disruption). This is because the tweet is in regards to the Hurricane Irma event, BUT it does not qualify as a disruption that would be important for the disaster management agencies, described in Section 2 of the competition rules, to use for making disruption response and recovery actions.

Q: For tweets such as the ones found on line 95, 81, and 91 provided in the sample csv file, what would be the best way to determine that these are in fact storm related? For example using the tweet found on line 95, what if this user simply forgot to pay their light bill for the month and they finally caught up on a late payment? Should we assume that negative tweets like these are inherently storm related?
A: A simple hint I can give that may help aid your team when evaluating these tweets would be to ask “Does this information given require any follow up action from a disaster management agency (i.e. an energy provider)?”. If the answer is no, the disruption status would be labeled as 0. 

Q: Can any program be used to write the algorithm? 
A: Any method/program can be used to write the algorithm for classifying the tweets.

Q: Will the list of tweets in the competition be given to us digitally? 
A: The tweets will be given to teams digitally. It hasn’t been determined yet whether it will be sent via email to team captains or accessible through a link on the competition page.

 Q: What will be the time limit of the competition?
A:  The competition tweets will become available at the beginning of the conference (March 12, 2020 at noon). From that time until the end of the day March 13, 2020 (The exact time is to be determined at a later date).

Q: For the Geo-Locations, will they count as 20 points per correct map? Or 20 points overall for having the functionality of creating maps for the locations?
A: For the visualization bonus, the team needs to produce a map with the locations of ALL the disruptions. Teams can also show the disruptions by category. The most innovative visualization will be given the full 20 bonus points. 

Q: In the new guideline, the water and communications categories have swapped values. (Water is now 2 and Communication is now 3) Was that deliberate?
A: We will amend the disruption type section, I apologize we didn’t catch that mistake when our faculty staff send us the update version of the rules. Type 2 should be communication related and type 3 should be water related.