Every other month we have a deadline for proposal for our Azure for Research Program. And on the 15 of the month we have a small committee that reviews the proposals and we receive a lot of them. We were very excited by the quality of this batch of proposals. We selected 38 proposals for the awards in this round. The authors, titles and abstracts are listed below (ordered alphabetically by first name). Remember our next deadline is April 15, 2014.
Also please note that we have a special request for proposals for projects that are interested in building linux virtual machine images for research that can be contributed to our Opentech VMDepot. We have postponed that deadline to April 15. More information is available at the project home page.
The February 15 Proposal Awardees
Alvaro Soto Arriaza, Pontificia Universidad Catolica de Chile, Chile
Title: Cloud-based Visual Ontology for Contextual Visual Recognition.
Abstract: The goal of this proposal is to build and test a visual ontology that is able to use visual information extracted from an image by state-of-the-art inductive detectors and refine it using common sense semantic networks in order to obtain meaningful descriptions on scenes and deducting contextual information. The visual ontology will work by mixing information from big visual databases and common sense semantic networks.
C. Titus Brown, Michigan State University, United States
Title: Open assembly and analysis of large sequencing data sets.
Abstract: We propose to execute existing cloud computing pipelines for de novo sequence assembly in the Azure cloud on a substantial number of data sets. We have three goals: first, improve the breadth of our knowledge about the natural world by making useful summary analyses available for existing and new data sets; second, execute well-defined and open protocols on the data, and retain detailed provenance information; and third, drive open biological science forward by analyzing people?s data in exchange for making it open.
Chao Wang, Tsinghua University, China
Title: Intelligent Sustainable Navigation Services (ISUNS)
Abstract: The objective of this research is to improve the eco-efficiency of overall urban transport decision-making by maximizing the access gained and information support from a given level of vehicular travel with providing citizens with safe, healthy, pleasant and well-informed urban travelling experience. The researchers will examine factors such as high urban residential density, travel styles, alternative options, costs, time consumptions, and travel experiences, etc. to develop a navigation system based on behavior-patterns. This proposal might also be associated with various policy changes.
Chaowei Yang, GMU, United States
Title: Spatial Cloud Computing: A Practical Approach
Abstract: Cloud Computing is redefining the possibilities of many geoscience disciplines. We wrote a book to introduce cloud computing to the geoscience communities. The book includes slides and hands-on examples for deploying, optimizing, and operating applications in cloud to serve as a text. This proposal is to develop a Microsoft Azure version of the examples so that the geoscience communities can not only learn how to use other cloud services, but also could stand up their own geoscience applications in Azure when finishing reading the book.
Chih-Yuan Yang, University of California, Merced, United States
Title: Single-Image Super-Resolution: A Benchmark
Abstract: A large-scale experiment is proposed to test state-of-the-art single-image super-resolution algorithms in order to build a systematic performance benchmark of existing methods. Due to the tremendous computational load, the experiments are best executed on a scale extendable computing platform such as Windows Azure. We have implemented several algorithm for run the experiments. The generated results will be the content of a paper submitted to a top-tier computer vision conference.
Conghui Zhu, Harbin Institute of Technology, China
Title: Chinese Minority Ethnic Languages Translation.
Abstrract: We plan to develop a Chinese Minority Ethnic Languages statistical translation system which at least supports three languages cooperated with Microsoft Research Asia. The first step is resources gathering, included: seed gathering, and parallel sentence crawling, extracting, evaluating from internet. The statistical translation system training and tuning are the next step which needs huge computations. Finally a translation API is supported to help people Exchange information equally.
Dariusz Mrozek, Silesian University of Technology, Gliwice, Poland
Title: Cloud4Psi. Cloud Computing in the Service of 3D Protein Structure Similarity Searching
Abstract. 3D protein structures exhibit high conservation in the evolution of organisms, and even if protein sequences diverged significantly, finding structural similarities allows to draw conclusions on functional similarity of proteins in various, sometimes evolutionary distant organisms. However, popular methods that allow searching for protein structure similarities are still very time-consuming. The similarity searching against large repositories of structural data requires increased computational resources that are not available for everyone. Our project addresses the problem. We are going to develop the cloud-based system that will be a highly-scalable and high-performance solution for protein similarity searching and for protein function identification.
Derrick Crook, University of Oxford, UK
Title: Modernising Medical Microbiolgy
Abstract: The Modernising Medical Microbiology Consortium which is at the forefront of translating pathogen whole genome sequencing into clinical practice is seeking to develop new rapid methods for analysing genomic sequence linked to clinical record data on a very large scale. The opportunities offered by access to the Microsoft Azure technologies will enable first in class experiments to be successfully completed. These will involve processing greater than 10,000 pathogen genomic sequences to unravel the evolution and spread of organisms fast enough to use the data clinically. New methods will also be developed to enable routine use in hospitals and medical services.
Didier Donsez, Universit‚ Joseph Fourier - Grenoble 1, France
Title: CIRUS : A Cloud Infrastructure for Real-time Ubilytics
Abstract: The Internet of Things (IoT) has become a reality with the availability of chatty embedded devices. The huge amount of data generated by things must be analyzed with models and technologies of the “Big Data Analytics”, deployed on cloud platforms. The CIRUS project aims to deliver a self-adaptive cloud-based infrastructure for real-time ubilytics (ubiquitous big data analytics). The CIRUS infrastructure collects and analyzes IoT data for M2M services using COST such as M2M gateways (OpenHAB, PeerGreen, ?), Message brokers (Mosquitto, RabbitMQ, JORAM, ?) or Message-as-a-Service providers and Analytics frameworks (Hadoop, Storm, S4, Samza) deployed and reconfigured dynamically with RoboConf.
Eduardo Alves do Valle Jr., School of Electrical and Computer Engineering - FEEC, UNICAMP,
Title: Medical Image Classification for Computer Aided Diagnosis with Deep Learning and Jumbo Vectors
Abstract: Information retrieval and content-based image classification has been studied by the scientific community in many different ways. A key application of this technology is Computer-Aided Diagnosis (CAD), improving doctor’s abilities to detect or prevent several diseases. Our aim is to advance the state of the art in CAD systems, for the screening of pathologies based upon medical images, focused on the early screening of melanoma. The techniques covered by this project involve the benefits of Deep Learning Architectures and Bag of Visual Words models, which show complementary advantages.
Eoin O'Grady, Marine Institute, Ireland
Title: Irish Digital Ocean - SMART Marine Research Platform
Abstract: The Irish Digital Ocean - SMART Marine Research Platform is a cloud environment tailored for data-intensive collaborative marine research and innovation. The platform has the potential to significantly improve the effectiveness of marine research, at team, organizational, national and international level, and lead to the development a vibrant marine research and innovation ecosystem. The platform will focus on collaborative research, the reuse of marine digital assets and translation of research outputs to new products and services. The platform will be a key component in the research stream of the wider Irish Digital Ocean (IDO) framework.
Hans J Johnson, The University of Iowa, United States
Title: Azure Cloud Testing and Algorithm Reproducibility in Medical Image Analysis.
Abstract: We propose the application of 200,000 CPU hours and 30 terabytes of storage to run CDash builds of proposed patches to ITK, reproducibility tests of Insight Journal submissions, and reproducible analysis for ITK community members.
Huy T. Vo, New York University, United States
Title: Building a 3D Model for New York City using LiDAR data.
Abstract: The benefits of having an accurate model of New York City (NYC) are enormous in many research areas of urban informatics, which often uses spatial correlation of multiple data sources to better understand how cities work. Unfortunately, there is no 3D model available to the researchers in urban informatics. Instead, we have approximately 20 billions LiDAR points (1TB of raw disk space). The project is to construct an actual 3D model of NYC from this massive point cloud. Given the size and complexity of the computing involved, we request to use Windows Azure to support this computation.
Jason Slepicka, University of Southern California United States
Title: Big Karma
Abstract: Karma is an open source information integration tool that learns how to assist users in cleaning and normalizing data, modeling its semantics, and publishing it in a variety of forms including RDF for the Linked Data Cloud. In order to handle much larger datasets in size and complexity, we are in the process of moving Karma’s machine learning algorithms and processing to the cloud. By moving to Windows Azure, Karma would gain access to a scalable Hadoop environment and free users from managing scaling infrastructure. This will make Karma more capable and available for far more users than otherwise possible.
Jiamin Xu, Shanghai Jiao Tong University, China
Title: Unsteady Aerodynamics and Aeroacoustics of Slat Morphing Trailing Edge.
Title: The objective of this proposal is to investigate issues related to flow control optimization based on large scale unsteady aerodynamic simulations combined with multi-level optimization strategies using Cloud resources. The research issues are representative of the challenges faced with aerospace design community when high-fidelity unsteady simulation models are considered.
Judy Qiu, Indiana University, United States
Title: Extending Twister4Azure and Integration with Apache Big Data Stack.
Abstract: We have shown that Iterative MapReduce is a powerful programming model for data intensive applications on both cloud and HPC platforms. We have developed Twister4Azure which is an implementation of these ideas on Azure. Our recent work has focused on a generalization of MapReduce to a Map-Collective model where the “reduce” phase in MapReduce is supported by a library of powerful optimized collective communication routines covering operations like (all)reduce, scatter, gather, broadcast, regroup, combine, and merge, which cover the key primitives in MapReduce and MPI. We showed that the same collectives could be added to Hadoop with a significant performance increase. Our optimized broadcast collectives for Twister enabled clustering with millions of centers. We integrate these ideas and request Azure time to perform the cloud-based Map-Collective research.
Kui Ren, University at Buffalo, State University of New York, United States
Title: The Power of Indoor CrowdIndoor - 3D Maps from the Crowd
Abstract: In this work, we address a critical task of reconstruct indoor large-scale 3D model from crowd-sourced images. We propose, design, and try to implement IndoorCrowd, a smartphone empowered crowdsourcing system for large-scale indoor 3D scene reconstruction. IndoorCrowd fills a gap in current cloud-based 3D reconstruction systems as it ensures at mobile side that the captured image set fulfills desired quality for indoor large-scene 3D reconstruction. At the cloud side, we deploy an automated image-based 3D reconstruction pipeline, which generates 3D models from images and sensor data.
Lawrence A. Husick, Quantum Cures Foundation, United States
Title: Cloud Based Drug Discovery for Malaria.
Abstract: Quantum Cures Foundation is a nonprofit that discovers new drugs for known disease targets and provides those new drug designs to the research community as "open source" for further development (see www.quantumcures.org). The design of new drugs is based on a cloud computing platform with high fidelity molecular modeling which is provided to Quantum Cures by TeraDiscoveries (www.teradiscoveries.com). The drug discovery platform, called Inverse Design, has been validated and runs on Azure. For this project, we propose to design a new drug to combat Malaria by inhibiting three different mutations of the known Malaria protein target pfDHFR-Ts.
Lianwen Jin, South China University of Technology, China
Title: FaceMore: An Innovative Facial Beautification Web Service based on Windows Azure.
Abstract: This project aims at building an online, elastic scalable, fast deployed system, FaceMore, that can deal with personalized face beautification on various scales of face data sets by taking advantages of special features provided Windows Azure and integrating the advanced face beautification technology. It is expected that this system may make full use of large amount of data, support large concurrent processing and provide unique functions like personalized face beautification and various special face effects based on the data driven average face hypothesis and the region-aware mask based facial beautification algorithm.
Long Quan, The Hong Kong University of Science and Technology, Hong Kong
Title: Large-scale Three-dimensional Urban Reconstruction.
Abstract: In this project, we propose a fully-automatic method for large-scale urban scenes 3D reconstruction based on the input images captured both at the ground level and the low latitude air using Windows Azure.
Matthew Graham, California Institute of Technology, United States
Title: A study of quasar variability.
Abstract: Quasars are one of the most important class of astronomical objects and are highly variable. However, the mechanism of their optical variability is poorly understood. We propose a definitive study to model the time series of 200000 quasars to determine the best-fitting stochastic description of their variability and look for correlations between variability features and physical parameters, such as black hole mass, that will help distinguish between different possible variability mechanisms.
Meng Xianhai, Beihang University, China
Title: A Cloud based platform for virtual geologic Earth.
Abstract: Geologic data and models have typical characteristics of big data. In this project, we aim to develop a cloud based platform for virtual geologic earth, which can be used to collect the geologic data and models, index them by unified spatial data model, and share them by visualization. The spatial data model is designed to manage different types geologic raw data and various models, and define the corresponding spatial index and file format. The cloud computing technology by Windows Azure is used to implement distributed data conversion, files storage and models release.
Michael Epitropakis, University of Stirling, UK
Title: Efficient Regression Test Optimization in Windows Azure
Abstract: The goal of the proposed research project is to develop novel search-based optimization methodologies that can improve the regression testing phase of a software project, in an automatic and cost-effective way. To adequately cater for real world regression testing cases, a multi-objective formulation is utilized, which enables us to test and study different properties of the system under test. The usage of the Windows Azure platform allows to tackle regression testing scenarios on large-scale open source software projects. Any developments toward this direction will help the software testing community to deal with regression testing scenarios as efficiently as possible.
Nando de Freitas, University of Oxford, UK
Title: Deep Learning on the Cloud.
Abstract: We plan to capitalize on a recent breakthrough in machine learning to build efficient parallel algorithms to train massive deep neural networks. If successful, this project will enable users to build deep learning applications on the cloud, thus significantly advancing AI.
Robert Boissy, University of Nebraska Medical Center, United States
Title; Secure, timely, open, pro bono cloud-based data management and analysis services for the molecular detection and continuous international molecular surveillance of known and emerging pathogens.
Abstract: Global health and agricultural and economic development would all benefit from the cloud-based deployment of secure, timely, open, pro bono data management and analysis services that are designed to help the physicians, veterinarians, agronomists, biologists, and related scientists and public officials responsible for studying and responding to known and emerging human, animal, and plant pathogens. This proposal describes the development and deployment of a limited number of such services. More importantly, the formation of a computing industry coalition is also proposed that could ensure that these and other similar cloud-based services are further developed, enhanced, and maintained.
Role, Paris Descartes University, France
Title: Azure-based Text Mining Tools for Genome-wide Association Studies
Abstract: The goal of the project is to develop and deploy on the cloud a set of robust, easy-to-deploy text mining tools to assist researchers in the analysis and interpretation of large-scale results coming from genome-wide association studies (GWAS).
Said Kharbouche, University College London, UK
Title: Online GlobAlbedo's Data Analysis and Visualization.
Abstract: In our GlobAlbedo project of daily earth surface's albedo mapping, we have developed an unique method for extracting and visualizing ROIs (Region Of Interest) on a single server on a first come, first served process. We would like to exploit the Microsoft-Azure cloud computing environment to reduce to as short as possible the system's response time to allow our end-users to be able to visualize, analyze and download the data regardless of the ROI size or the number of simultaneous requests all in the shortest possible time period.
Sergey Chernov, New Economic School, Russia Enabling Large-Scale Social Network Analysis using VK Data Social media ecosystem attracted a great deal of research attention in the past decade. Numerous research projects aim at large-scale analysis of available online networks, including worldwide-popular resources like Facebook, Twitter, etc. Still, these studies fall short to address countries, in which aforementioned networks are not the most widespread ones. In particular, Russian social network VK has about 220 mln active accounts, but it is mostly ignored in research literature. We would like to collect and process VK data into a publicly available research dataset to provide interested scientists with an easy access to the social network data on Russia.
Srikumar Venugopal, University of New South Wales, Australia
Title: Scalable Protein Sequence Similarity Search for Metagenomics
Abstract: Metagenomics is the study of uncultured microorganisms from their habitats. In recent years, so-called next-generation sequencers have boosted the speed at which genomes can be sequenced from environmental samples. This in turn has led to a deluge in the amount of available metagenomic data. A key step in the study of metagenomic data is sequence alignment, that is computationally-intensive over large datasets. Tools such as BLAST require large-scale dedicated computing infrastructure for such analysis. We introduce ScalLoPS, a new tool designed to scale protein sequence alignment across cloud resources. This project proposes evaluating ScalLoPS against BLAST for Windows Azure using metagenomic datasets.
Tai-Quan Peng, Nanyang Technological University, Singapore
Abstract: Tracking Social Happiness on Twitter: A Multi-level Study
Abstract: It is a long-standing interest among the public and researchers to observe and explain individuals’ happiness. The project will mine rich time-stamped information stored on Twitter, which are objective and real-time records of user’s subjective feeling, to fulfill three objectives: (1) developing and testing a time-variant and domain-specific Happiness Index of Twitter (HIT) at both individual and societal levels; (2) modeling the dynamics of HIT at both individual and societal levels; and (3) uncovering causal mechanisms underlying the dynamics of HIT.
Thanh N. Truong, University of Utah, United States
Title: Engaging Citizen Scientists in Computer-Aided Drug Discovery.
Abstract: Azure cloud computing can open new opportunities for citizen scientists to engage in the actual scientific discovery. The development of e-Science Community Laboratory, a cloud-enabled web portal allows the public to contribute to computer-aided drug design research, and is accessible anytime, anywhere, and by anyone. It uses the crowdsourcing technology to create a social network of citizen scientists within the existing social networks. This is done by changing the virtual drug screening process into a game that allows interested individuals to play against others on who can pick a better drug candidate for a certain disease target.
Viswanath Nandigam, University of California San Diego, United States
Title: Integration of cloud based on-demand geospatial processing services into community earth science data facilities.
Abstract: OpenTopography is a NSF funded earth science facility that provides online access to high-resolution topography data and tools. By leveraging the underlying SOA design of the OpenTopography system, we plan to develop a pluggable services infrastructure that will allow processing routines developed by the community on external cloud resources like Azure to be plugged into the existing OpenTopography system workflow so that the entire community of users can benefit from the new functionality e.g. change detection, differential analysis and time series analysis between datasets.
Xueming Qian, Xi'an Jiaotong University, China
Title: Schedule Travel Life by Exploring Spectrums of Social User and City Services.
Abstract: Make a schedule for a short-medium term travel user by exploring users’ social community and services of locations of destination cities is very important. In this proposal, 1) we propose to mine social users’ preferences/life-spectrum using the community-contributed information from their travel history and check in history in their residents. 2) we propose to mine city services/activity spectrum from the crowd-source contributed by world-wide users including the check-in data of local residents (comments, geo-locations etc), and the travel information from users? shared photos and travelogues. 3) we propose to recommend users preferred services/activities according to the temporal-geo-social spectrum similarities. 4) we provide an objective/overall summarization of the local services for improving the services qualities. The problems we need to solve are as follows: 1) how to mine users? preferences; 2) how to mine city services spectrum; 3) how to recommend user personalized services/activities when in an unfamiliar city; 4) how to recommend user personalized events (sequential services) Outputs of this project are as follows :1) provide a solution to schedule personalized travel life by exploring spectrums of social user and city services, 2) provide objective feedback for local services department to improve services quality by summarizing the comments/evaluations from world-wide users, 3) publish several papers and show demos in important international conferences, 4) share our datasets for researchers in this area , 5) train 5 MSD/PhD students in this research area.
Yang(Jon) Zhang, Cornell University, United States
Title: BioHPC Azure Integration.
Abstract: With new next generation sequencing techniques producing ever increasing amounts of data, demand for HPC resources for biological research have increased as well. BioHPC was created at Cornell CBSU in part to address this demand. We believe that an Azure based cluster will further increase the flexibility of BioHPC allowing the users to access a much larger pool of computational resources, and also help reduce the cost and simplify the process of installing future instances of BioHPC.
Yingchi Mao, Hohai University, China.
Title: Safety and efficient utilization of hydropower development in Lancang river based on Windows Azure.
Abstract: Based on the dam safety analysis and evaluation results of XiaoWan and Nuozhadu Hydropower station in Lancang River, we will analyze the relationship between the ecological environment and the dam safety of XiaoWan hydropower station project, also indicate the impact of dam on the ecological environment. We will apply the data mining techniques and theories to the analysis of the safe of dam and the environment. Moreover, we will establish a data mining method library. The project will implement XiaoWan Hydropower secure environment comprehensive analysis of data mining prototype system.
Zhenlin Yang, Oregon State University, United States
Title: Forest Mortality, Economics and Climate change
Abstract: We have a 5 yr project FMEC on modeling drought mortality and predicting vulnerability with Community Land Model (CLM) 4.5 over western North America, and linking it with an economic model to evaluate both carbon and economic implications of mitigation actions. We want to utilize 30-50 cores of Azure to run Fortran-written CLM 4.5 for the model spinups and simulations. This will greatly help the progress of FMEC project and help our understanding of forest mortality in Western US.
Zheping Xu, Insitute of Botany, Chinese Academy of Sciences, China
Title: Dynamic Biodiversity Protect and Monitor in a Cloud Environment.
Abstract: We are experiencing sixth mass extinction of plants and animals. We should take any effect to protect the biodiversity of our planet. Besides occurrences from specimens and observatory, much temporal distribution information of kinds of species can be extracted and processed from scientific literatures. However, we should a high performance environment to store and process our huge data (>20TB), namely more than 112 million records from 42 million pages, and may generate more than 500 million distribution points. There are also some new techniques should be introduced: machine learning, natural language processing, GIS and etc.
Zhumin Chen, Shandong University, China
Title: Urban lifestyles Detection form Big Heterogeneous Human Behavioral Data using Windows Azure.
Abstract: Lifestyle is the typical way of life of an individual, group, or culture in different nations. Understanding lifestyles is significant. This project is to study how to use Windows Azure to discover the lifestyles based on the heterogeneous human behavioral data in Web news, Twitter, Weibo etc. We will use Work Role, HDInsight, SQL database, Virtual Machine, Mobile Service etc. of Windows Azure to identify, collect, store and process lifestyle related data, and then detect and evaluate the lifestyle. We hope the results of this project can help people to find the historical changes of their lifestyle and help organizations to find the possible crises in lifestyles.