Please wait, while we are loading the content...
Please wait, while we are loading the content...
| Content Provider | ACM Digital Library |
|---|---|
| Author | Parameswaran, Aditya Wang, Shaowen Soltani, Kiumars |
| Abstract | Since its birth in 2006, Twitter has evolved to a multi-purpose social media that attracts hundreds of millions of users to share their activities and ideas on a daily basis. The potential of capturing fine-grained activity log of users, combined with ever increasing geographical information derived from GPS-enabled devices, has made Twitter data a valuable source for spatiotemporal analysis of human activities. One of the early innovations of Twitter is the use of hashtag as a unique tagging mechanism to provide additional information about a user post. From its emergence in late 2007, hashtags have been used extensively to express ideas, group tweets and report events among Twitter users. The increasing popularity of hashtags, in addition to their simple and concise structure, has inspired multiple recent studies to propose hashtag as a medium to assess diffusion of ideas in a virtual world. Studying collective effort of users in making a hashtag go viral can shed light on the complex process of idea diffusion that involves psychological, sociological and geographical elements. Although most of the previous research on idea diffusion in virtual world purely focuses on the users social graph, recent studies have confirmed that the spatial relationship among users and regions also play a crucial role in its adoption patterns [1]. This comes back to First Law of Geography that was formulated by Waldo Tobler more than 40 years ago, as "everything is related to everything else, but near things are more related than distant things". However, previous work on designing an interactive visual analytical framework for hashtag diffusion (http://keyhole.co/, http://hashtracking.com/, https://tagboard.com/), lack in-depth spatial analysis capabilities, hence not well-suited to be used for studying diffusion patterns. This research aims to fill this gap by providing an interactive framework to offer visual analytics on geographical diffusion of hashtags over time. Our framework, called GeoHashViz, can provide both textual and visual analytics on the role of location in adoption of hashtags and offer insights on diffusion patterns among different hashtags. GeoHashViz processes large stream of incoming tweets using a Hadoop-based approach and calculates multiple measures that will be used to generate visual analytics for the user. Furthermore, it integrates online maps with a live animation tool to visualize both spatial and temporal diffusion of hashtags at the same time. Data Collection: we gather our data using the Twitter Streaming API (details in [3]).Since we are only interested in common hashtags, which have a certain level of popularity, we only keep the hashtags with more than 1000 appearances. Our unit of spatial resolution is set to cities in United States with a population larger than 60000 people that give us 645 unique locations. These locations will form our reference grid and every geographical point will be assigned to its nearest neighbor in the reference grid. Analytics: To formulate the problem of spatiotemporal analysis of hashtag diffusion, we recognized two main categories of hashtag-based and location-based analytics. In hashtag-based analytics we focus on specific hashtags and their associated diffusion patterns. On the other hand, location-based analytics study the similarity and closeness of locations in terms of their hashtag adoption. To evaluate the usability of the framework, we identify five core analytical features that cover wide ranges of research questions. However, our framework can be easily extended to include more analytical features. The five visual analytical capabilities are listed in Table 1. Spread and focus points (locations with highest occurrence of the hashtag [1]) provide users with a visual estimate of how the hashtag is diffused over time. However, we also provided four metrics that gives a user a more concrete sense of the diffusion patterns: a) Entropy: Measures the randomness of hashtag distribution [1] ;b) KL-divergence: Compare the geographical distribution of hashtag in consecutive time windows using KL-divergence method ;c) Spatial Dispersion: Measures how scattered is the hashtag from its geographical midpoint ;d) Count:. Plot the cumulative count of the hashtag over time. For location-based analytics we included two functions. Top-k hashtags calculate the most popular hashtags in a region and visualize that using a word cloud. However by simply looking at the counts, we may miss some locally significant due to their relative low count. To reduce the dominance of globally popular hashtags, we introduce another analytic that will visualize top-k locally significant hashtags. This analytic uses a Tf-idf like metric [5] to measure the local popularity of a hashtag in a specific region, hence assigning lower rank to the hashtags which are popular in other places as well. In addition, we provide two metrics for comparing two different regions in terms of hashtag adoption: a) Jaccard Similarity Compare the set of hashtag used in two different regions, with higher number assigned to more similar regions ;b) Adoption Lag This measure depicts how long it takes for a hashtag to travel between two region, by averaging the time difference between the first appearance of hashtags in two regions. Architecture: GeoHashViz framework follows a two-layer architecture: an offline-processing module and an interactive module. The offline-processing module, implemented entirely in Apache Hadoop and called periodically, processes the raw data and pre-computes measures related to spatiotemporal diffusion of hashtags. The interactive module on the other hand is called on demand and based on user requests. The two modules connect with each other through a distributed MongoDB database. The two-layer architecture enables a fast interactive final framework by reducing the data processing that interactive module is required to do. In the offline-processing module, significant hashtags are extracted and the points are laid on the geographical mesh that we defined above. Then two MapReduce jobs are executed: one for pre-computing measures related to hashtag-based analytics and one for location-based analytics. All the Hadoop experiments were conducted using XSEDE Gordon Hadoop cluster. The data-intensive nature of our problem, requiring aggregation of large number of tweets based on both hashtags and locations, make Hadoop an ideal choice for the offline-processing module. Using Hadoop, we distribute the tweets into multiple nodes, and then take advantage of MapReduce model to aggregate them based on their associated location on the mesh and their included hashtags. In the reduce step, having access to all the tweets for a certain location/hashtag, we can generate the analytics for different timestamps. In addition, since the nodes on Gordon Hadoop cluster have relatively high memory, we are able to store the geographical mesh in memory and quickly map the location of users to their closest point on the mesh (using kd-tree). The same technique is employed in the interactive module to find the set of mesh points which lies into the user-defined bounding box. The interactive module includes a web application and a Java Servlet. The web application is integrated into Cyber-GIS Gateway [2] to increase usability of the application and easier integration with other CyberGIS applications. Figure 1 shows a view of the application visualizing top 20 hashtags in the southern California region in September 2014. |
| Starting Page | 1 |
| Ending Page | 2 |
| Page Count | 2 |
| File Format | |
| ISBN | 9781450337205 |
| DOI | 10.1145/2792745.2792782 |
| Language | English |
| Publisher | Association for Computing Machinery (ACM) |
| Publisher Date | 2015-07-26 |
| Publisher Place | New York |
| Access Restriction | Subscribed |
| Subject Keyword | Interactive visualization Social media Hadoop Cybergis Geohashviz |
| Content Type | Text |
| Resource Type | Article |
National Digital Library of India (NDLI) is a virtual repository of learning resources which is not just a repository with search/browse facilities but provides a host of services for the learner community. It is sponsored and mentored by Ministry of Education, Government of India, through its National Mission on Education through Information and Communication Technology (NMEICT). Filtered and federated searching is employed to facilitate focused searching so that learners can find the right resource with least effort and in minimum time. NDLI provides user group-specific services such as Examination Preparatory for School and College students and job aspirants. Services for Researchers and general learners are also provided. NDLI is designed to hold content of any language and provides interface support for 10 most widely used Indian languages. It is built to provide support for all academic levels including researchers and life-long learners, all disciplines, all popular forms of access devices and differently-abled learners. It is designed to enable people to learn and prepare from best practices from all over the world and to facilitate researchers to perform inter-linked exploration from multiple sources. It is developed, operated and maintained from Indian Institute of Technology Kharagpur.
Learn more about this project from here.
NDLI is a conglomeration of freely available or institutionally contributed or donated or publisher managed contents. Almost all these contents are hosted and accessed from respective sources. The responsibility for authenticity, relevance, completeness, accuracy, reliability and suitability of these contents rests with the respective organization and NDLI has no responsibility or liability for these. Every effort is made to keep the NDLI portal up and running smoothly unless there are some unavoidable technical issues.
Ministry of Education, through its National Mission on Education through Information and Communication Technology (NMEICT), has sponsored and funded the National Digital Library of India (NDLI) project.
| Sl. | Authority | Responsibilities | Communication Details |
|---|---|---|---|
| 1 | Ministry of Education (GoI), Department of Higher Education |
Sanctioning Authority | https://www.education.gov.in/ict-initiatives |
| 2 | Indian Institute of Technology Kharagpur | Host Institute of the Project: The host institute of the project is responsible for providing infrastructure support and hosting the project | https://www.iitkgp.ac.in |
| 3 | National Digital Library of India Office, Indian Institute of Technology Kharagpur | The administrative and infrastructural headquarters of the project | Dr. B. Sutradhar bsutra@ndl.gov.in |
| 4 | Project PI / Joint PI | Principal Investigator and Joint Principal Investigators of the project |
Dr. B. Sutradhar bsutra@ndl.gov.in Prof. Saswat Chakrabarti will be added soon |
| 5 | Website/Portal (Helpdesk) | Queries regarding NDLI and its services | support@ndl.gov.in |
| 6 | Contents and Copyright Issues | Queries related to content curation and copyright issues | content@ndl.gov.in |
| 7 | National Digital Library of India Club (NDLI Club) | Queries related to NDLI Club formation, support, user awareness program, seminar/symposium, collaboration, social media, promotion, and outreach | clubsupport@ndl.gov.in |
| 8 | Digital Preservation Centre (DPC) | Assistance with digitizing and archiving copyright-free printed books | dpc@ndl.gov.in |
| 9 | IDR Setup or Support | Queries related to establishment and support of Institutional Digital Repository (IDR) and IDR workshops | idr@ndl.gov.in |
|
Loading...
|