April 28, 2023
Program synthesis is a field that involves automatically generating programs from high-level specifications. Codex such as GPT-S and CodeBERT, that are Large Language Models (LLM) which have been trained on large corpora of code have demonstrated very promising results in code generation. This presentation will focus on the state-of-the-art research on LLM Codex in program synthesis, including the recent breakthroughs in generating code that is both efficient and accurate. The presentation will also identify the limitations and gaps in the current research and propose a way forward for future work in this area.
November 04, 2022
Due to its invisibility feature, pressure force is useful to enhance the security of authentication, especially preventing shoulder surfing. However, it is challenging to memorize a pressure-based password. This presentation will show a pressure-based authentication system with personalizing the detection of pressure force, which concretizes a pressure-based password as a decimal number to reduce the effort of memorization, improve the accuracy, and enhance the security. We conducted two user studies to compare the four-pin password with our pressure-based password regarding their usability performance and security evaluation. The results of the first study indicated that the pressure-based password is more secure, but the four-pin password is faster and has higher subjective satisfaction. We have conducted a second study that asked participants to use the pressure-based password once every day for 10 continuous days. The second study revealed that both the completion time and subjective satisfaction of the pressure-based password were significantly improved.
April 25, 2022
Artificial intelligence (AI) and related technologies are increasingly being applied to healthcare. These technologies have the potential to make healthcare more efficient, affordable, and personalized. In this talk, I will review our current research on using semantic AI technologies, specifically, ontologies and knowledge graphs to empower AI in health, along with opportunities, challenges, and practical implications.
October 11, 2021
High resolution raster data from small satellites, drones, and airplanes has become increasingly common in recent years. In this presentation I show(ed) that new raster processing techniques are needed to effectively analyze such data. I describe(d) the process of deriving topographic variables such that the resulting algorithm scales to large window sizes without reducing the resolution of the output image, introducing artefacts, or substantially reducing performance. Moreover, I show(ed) that the same ideas can be used for a vast set of derived attributes including fractal dimensions and regression coefficients, provided that only additive measures are needed in their computation. The effectiveness is/was demonstrated for real and artificial landscape models.
January 25, 2021
An overview of the Solarwind’s Russian Hack. A security breach that went on for months, and experts believe at this time, given the widespread impact, that Russia is most likely responsible. Kotala states that these attacks happened despite the investment of billions of dollars to create state-of-the-art server security protections. There are multiple factors that made these attacks possible, states Kotala, including the safeguarding of the recent election. The attack was quite sophisticated and is of a type that we have never seen before. This happened at a time when the Cybersecurity Infrastructure Security Agency (CISA) was without a full-time director. Additionally the assistant director was asked to resign on November 12th during the time of the attacks. It is difficult to say whether or not we will ever know the details of what the hackers acquired, as they might have had access to the networks for six to nine months, which gave them ample opportunity to transfer data to their servers, manipulate programs, and potentially control the networks.
December 9, 2019
The talk will a brief history of how the Capstone course has evolved over the past 15 years. Some examples of completed projects will be shown as well as plans for the spring semester projects. This will be followed by an overview of the International Capstone Project Exchange (ICPE) including how it has grown over the years. Finally, the plans for making the ICPE self sustaining will be presented.
November 6, 2019
Aligning short reads to a reference genome is an essential step in many genomic analyses using next generation sequencing (NGS) technologies. In plants that have large genomes, a large fraction of the reads can align to multiple locations of the genome with equally good alignment scores. How to map these ambiguous reads to the genome is a challenging problem with big impacts on the downstream analysis. Traditionally, the default method is to assign an ambiguous read randomly to one of the many potential locations. We explored alternative methods based on the hypothesis that the possibility of an ambiguous read being generated by a location is proportional to the total number of reads produced by that location. The study leads to the establishment of a new alignment method, which was able to discover more SNP markers with better quality demonstrated in multiple mainstay genomic applications. If time permits, I will also introduce our recent work on deep learning for bioinformatics. In a traditional machine learning scheme, biological data are represented using features defined by domain experts. The success of machine learning methods heavily relies on the effectiveness of the hand-designed features. However, in many practical problems, it is difficult to decide what features are appropriate. Deep learning, a branch of machine learning, overcomes this obstacle through replacing hand-designed features by data-driven features. Despite the superior performance it has delivered, deep learning is facing considerable criticism because of its low interpretability, i.e., due to the multilayer non-linear structure of deep learning it is hard to explain how they arrive at a decision. Consequently, in most, if not all, bioinformatics applications, deep learning methods have been used as a black box, with no knowledge about how the predictions were made. However, this is unacceptable in many fields. In this project, we aim to develop interpretable deep learning methods for bioinformatics. Using a convolution neural network, we demonstrate how the deep learning structure automatically learns sequence signatures for classification.
November 28, 2018
This paper evaluated the effect of the spatial location of a physical cursor relative to a digital object in the tabletop-centric cross-device interaction. In the study, sixty-two participants were recruited and randomly divided into two groups. Each group used one pointing technique (i.e., either direct pointing or proximity pointing) to complete 12 trials, which were grouped into three types of tasks (i.e., selection, read-and-count, and read-and-write), under conditions of display mode (i.e., single-object display and multi-object display) and selection mode (i.e., cluster selection and random selection). Experiment results provide empirical evidences on (1) the advantage of proximity pointing in the completion time; and (2) selection mode moderates the effect of pointing techniques under different scenarios. Based on the empirical results, we summarized a set of guidelines that support the development of tabletop-centric cross-device interaction in the future.
February 23, 2018
This project proposes a proactive diabetes self-care mobile platform based on the unique socio-economic, cultural, and geographical status of AI patients living in a reservation community in the upper Midwest. The mobile platform connects AI diabetic patients to their medical devices, healthcare team and similar patients, and offers personalized prediction, recommendation, and social networking regarding diabetes care. It transforms diabetes management from the traditional reactive and hospital-centered care to preventive, proactive, evidence-based, and person-centered care.
January 31, 2018
Dr. Ludwig will present research results from two research projects. The first research project investigates different classification algorithms applied to BCI data, and the second project applies a deep neural network ensemble to Intrusion detection data.
November 1, 2017
The CS Capstone project course has evolved considerably over the past few years. (Course structure, tools, processes, IP, etc.) This talk will go over that history and also describe some exciting things happening with international projects and the establishment of an International Capstone Project Exchange. The exchange presently includes universities in Germany, Sweden, Austria, Netherlands, Colombia, and Australia. US universities include the University of IL - Urbana/Champaign, Texas A&M, the University of Pittsburgh, the University of Colorado - Colorado Springs, North Carolina State University, Oregon State University, Mississippi State University, etc. In addition to Computer Science we are also doing projects in ME, EE, and International Marketing. Several more universities are investigating pairings in these and other disciplines.
October 20, 2017
Intel is offering a webcast on the fundamentals of machine learning, deep learning and artificial intelligence (see details below) on Friday, October 20th, from 6:30 – 8:00pm CDT (4:30 to 6pm PDT). We will be viewing the webcast in QBB 104. During the webcast there is opportunity to communicate with the Intel engineers via the SLACK channel (computer or cell phone). Once you register, all the information will be provided to you. If you are interested, then please sign up at Eventbrite link for Intel webcast. Should more than 20 people register, then refreshments will be provided.
October 4, 2017
Data science for food, energy, and water is increasingly recognized as a topic of high global relevance. However, it may not always be clear what types of research results will ultimately improve the outlook for our planet. The prediction of agricultural yield is a natural starting point for data scientists, because it resembles data mining tasks in other areas. An increasing availability of large data quantities indeed allows answering questions that have previously been the subject of educated guesses, for example how the temporary distribution of precipitation affects yield. In a changing climate, yield prediction and optimization can result in counterproductive recommendations that aggravate the problems that motivated the research, prompting an examination of the ethics of scientific problem statements. Prediction of soil health emerges as an important and ethically robust research topic that offers fundamentally and practically relevant data science research questions. Window-based derived attributes, in particular fractal dimensions, are shown to be especially promising for this purpose.
March 22, 2017
Dr. Walia's presentation will illustrate the ability and value of applying human error research (borrowed from Cognitive Psychology) to a software engineering problem through close interaction between software engineering experts and cognitive psychology experts. He will also discuss research results on how to staff and improve inspection performance in the Software Industry.
March 1, 2017
Dr. Nygard will address the needs for cyber education and characterize the field. Drs. Salem and Kotala will provide an overview of the Cyber Security course that they are currently teaching.
November 16, 2016
With the advances of various genomic and proteomic projects, biomedical research has entered a data-driven era, in which computational methods are urgently needed for the task of mining complicated biological data to discover knowledge that can be used for disease treatment and drug design. In this talk Dr. Yan will present the graph methods that his group is developing for the tasking of discovering function-associated structural patterns in biological macromolecules. Using a succinct graph representation the Yan group discovered structural patterns that were enriched in the functional sites. The biological significance of these patterns was evaluated using statistical test, molecular docking, and expert knowledge.
October 19, 2016
Complex networks appear in a wide spectrum of fields, including bioinformatics and neuroscience. These systems (networks) are governed by intricate webs of interactions among constituent elements. Entities and relationships in networks are increasingly being annotated with content, thus giving rise to rich attributed graphs. In this talk, we will present graph mining algorithms for mining multiple gene expression datasets, and mining discriminative subnetworks for gene coexpression networks.
April 28, 2016
The main goal of software engineering is to improve the reliability of software. The main impediment to achieving that goal is the complexity of software development which arises in part from the complexity of software. This presentation describes a theory of software complexity including some case study and experimental results.
March 8, 2016
The history of cryptography can be likened to a reawaking history of mathematics and computer science. The story of cryptography goes back 4000 years and some of the mathematics employed goes back as long. This talk will address the history of cryptography beginning with the Enigma used by the Germans in WWII and broken by world famous Mathematician/ Computer Scientist Alan Turing. It will continue down to today’s advanced crypto systems such as RSA, PGP and Elliptic Curve cryptography. The lecture will point out the key role that cryptography plays in the future of e-commerce and the new products and ways of doing business that results when secure communications through cryptography is available.
Don Costello has had a mixed career splitting his time between Universities and Business. He helped start three Computer Science Departments and three University Information Technology facilities (University of Nebraska, University of Wisconsin – Oshkosh and Madison and Colorado State University). He has taught undergraduate and graduate courses and has done work in research areas of Cryptography and Network Security, Statistical Computing, Performance Modeling, and Managing Intellectual Property. He is a 40-year member of ACM and is a fellow of the British Computing Society. He has lectured all over the United States as well as in England, Ireland, Austria, Germany, India and Sri Lanka. He also held a four-year Carnegie Foundation grant to investigate how IP is managed in Universities around the World and has led new teaching efforts efforts in Cryptography and Network Security. In business career he has managed IT facilities, founded and sold two firms and consulted with over 100 firms throughout the world. His recent consulting includes five years consulting on ERP systems, SAP, as well as being a Technical Consultant on .com and e-Learning projects. Don currently holds a position as a Associate Professor Emeritus at the University of Nebraska.
*ACM Distinguished Lecture*
November 18, 2015
A shared interactive display (e.g., a tabletop) provides a large space for collaborative interactions. However, a public display lacks a private space for accessing sensitive information. On the other hand, a mobile device offers a private display and a variety of modalities for personal applications, but it is limited by a small screen. We have developed a framework that supports fluid and seamless interactions among a tabletop and multiple mobile devices. This framework can continuously track each user’s action (e.g., hand movements or gestures) on top of a tabletop and then automatically generate a unique personal interface on an associated mobile device. This type of inter-device interactions integrates a collaborative workspace (i.e., a tabletop) and a private area (i.e., a mobile device) with multimodal feedback. To support this interaction style, an event-driven architecture is applied to implement the framework on the Microsoft PixelSense tabletop. This framework hides the details of user tracking and inter-device communications. Thus, interface designers can focus on the development of domain-specific interactions by mapping user’s actions on a tabletop to a personal interface on his/her mobile device. The results from two different studies justify the usability of the proposed interaction.
April 10, 2015
It is understood that textual information is growing at an astounding pace, creating an enormous challenge for analysts trying to discover valuable information that is buried. For example, new non-trivial trends, patterns, and associations among entities of interest, such as associations between genes, proteins and diseases, and the connections between different places or the commonalities of people, are such forms of underlying knowledge. The goal of my research is to explore automated solutions for sifting through these extensive document collections to detect interesting links and hidden information that connect facts, propositions or hypotheses. This talk will present an overview of our solution including the creation of a new textual knowledge representation, integration, and mining framework that effectively integrates domain knowledge and large-scale knowledge repositories such as Wikipedia. Various mining models addressing emerging information discovery needs will also be presented.
March 13, 2015
One of the greatest challenges of an enterprise's service center is to ensure that their engineers and customers are provided with the right information in a timely fashion. For this purpose, modern organizations operate a wide range of information support systems to assist customers with critical service requests. It is often the case that relevant information is scattered over the Internet and/or maintained on disparate systems, buried in large amount of noisy data, and in heterogeneous formats, thereby complicating the access to reusable knowledge and extending the response time to reach a resolution. To address these challenges, in this project, we propose an effective knowledge mining solution to improve service request resolution success rates and response times.
February 27, 2015
The management and analysis of big data has been identified as one of the most important emerging needs in recent years. This is because of the sheer volume and increasing complexity of data being created and/or collected. Current clustering algorithms cannot handle big data, and therefore, scalable solutions are necessary. Since fuzzy clustering algorithms have shown to outperform hard clustering approaches in terms of accuracy, this presentation will outline the parallelization and scalability of a common and effective fuzzy clustering algorithm named Fuzzy C-Means (FCM) algorithm. The algorithm is parallelized using the MapReduce paradigm. A validity analysis is conducted in order to show that the implementation works correctly achieving competitive purity results compared to state-of-the art clustering algorithms. Furthermore, a scalability analysis is conduced to demonstrate the performance of the parallel FCM implementation with increasing number of computing nodes used.
November 12, 2014
A new proposed social media marketing approach addresses the existing social media marketing tools' limitations and links customers' offline experiences and actions online. This approach will allow us to track and collect the customers' offline experiences in real-time and to obtain good quality data. We evaluated the proposed approach through several case studies that were applied to real marketing events and presented the results of the case studies. Through these studies, we demonstrated that our method can be more effective and efficient in increasing the online exposure effect and more useful in increasing the participation rate of customers compared to existing social media marketing methods.
October 8, 2014
At the United State Agency for International Development (USAID) and at the Department of State, data science, analytics, visualization, and mathematical modeling all play important roles in the conducting and evaluating of international projects and programs. To illustrate, we describe several example projects, such as understanding resilience to drought in Ethiopia, how people perceive biotechnology risk worldwide, energy development in Armenia and in Pakistan, and infrastructure development in Ghana.
March 26, 2014
Given a set of alternative designs for the same application, how do we determine which should be employed for a specific purpose. This talk starts by presenting a purpose- oriented theory of software artifacts. This theory is compared with the two major efforts to develop theories of software development. This theory then is used to develop two software complexity metrics for different purposes. Preliminary results are presented indicating the potential usefulness of these metrics for their intended purposes.
May 7, 2014
Graphs are increasingly being used to represent systems of interacting entities. In addition to the structural relationship defined by edges, these graphs can have other sources of data annotating entities and relationships. In this talk, we will present graph mining algorithms for integrating gene's attribute data in the module discovery problem from protein-protein interaction networks by integrating gene regulation in different experiments. The second part of the talk will focus on mining graphs with edge attributes. We will present two algorithms: Mining coexpression patterns from multiple gene expression datasets, and mining cross graph communities from social and collaborative networks.
April 11, 2013
Dr. Walia's research interests lie in software engineering, particularly software quality improvement and measurement, software inspections and software errors, and software engineering education. The goal of his research is to improve the quality and reliability of software through the use of Empirical Software Engineering. Empirical research is based on observation and measurement of different aspects of software development in the context of human-subject experiments. Dr. Walia's approach to empirical software engineering involves conducting empirical studies to study how people (i.e. developers) use processes and tools in different settings and to objectively evaluate these technique(s). His research is multidisciplinary, i.e., it uses approaches that have been applied successfully in other domains and adapts them for the task of improving and managing software quality. He will talk about the results from empirical studies that have been conducted at NDSU to 1) develop a deeper understanding of human errors and develop techniques to detect and prevent errors early in the software development lifecycle; 2) evaluate the use of the Capture-Recapture method in software organizations, and 3) to improve the state of software engineering education.
October 10, 2012
Successful software systems evolve: they are enhanced, corrected, and ported to new platforms. To ensure the quality of modified systems, software engineers perform regression testing, but this can be expensive depending on the size of the systems and their complexity and it is responsible for a significant percentage of the costs of software. For reasons such as this, researchers have spent a great deal of time creating and empirically studying various techniques for improving the cost-effectiveness of regression testing. Despite the progress this research has achieved to date, several important aspects have not been considered, such as factors involving testing context and the system lifetime view. In this talk, I'll present research activities that can address these limitations as follows: (1) creating cost-effective regression testing techniques that address the testing process and domain contexts, and (2) creating regression testing strategies that address system lifetimes, (3) creating economic models that enable the adequate assessment of techniques and strategies, and (4) evaluating and refining these techniques and strategies through rigorous empirical approaches.
September 12, 2012
Many data sources can contribute to an understanding of agriculture. We will look at how data mining techniques can help understand the relevance of agricultural choices and help predict outcomes. The presentation will also show where questions that were originally driven by companies can result in data mining problems that have not previously been discussed by either the data mining or statistics communities. Considering the large amounts of satellite, weather, soil, and other data, the problem of data-driven agriculture cannot be addressed without modern computing infrastructure. Parallel data processing, as it was introduced by Google, and later continued as part of the open source Hadoop effort, can help with processing of large image files. We will also see how such processing can be done with general purpose departmental computing infrastructure, i.e., using the infrastructure that is now commonly referred to as cloud.
April 11, 2012
Wireless network and mobile devices make it possible to access information from anywhere at anytime. However, most Web pages are tailored to personal computers. Without an adaptation, it is frustrating to browse those pages on mobile devices since users have to frequently scroll the display window to find the content of interest. Keeping two versions of presentations, one for desktops and the other for mobile devices, is time-consuming and error-prone. This project will investigate an innovative approach to automatically adapting Web pages from a desktop presentation to a mobile presentation, and address two challenging issues: fine-grained page segmentation (i.e. discover closely related information) and human-centric context-aware adaptive layouts.The proposed work abstracts an HTML Web page as a spatial graph that highlights the most essential semantic relations among atomic information blocks, and applies a graph grammar to the spatial graph for page segmentation. The graph grammar based page segmentation can recognize the semantic role of an information block and specify the page segmentation visually in the form of graphical rules that can be reused in different Web sites. Based on the hierarchical information organization discovered from the page segmentation, a new adaptive layout will be generated to fit the small screen of mobile devices. All existing approaches only support one presentation style or another, and do not support personalized styling. We will summarize a set of existing adaptive styles. A user can choose a style based on his/her personal preferences.
February 29, 2012
The basic concept of a Smart Grid is to add intelligent capabilities to the systems that produce, deliver, and consume electrical power. Potential benefits are many, and include gains in efficiency, conserving of natural resources, improving the utilization of alternative energy sources, cost containment, and advances in sustainability. There are many computer science and software engineering research issues in a Smart Grid operation, including 1) data mining, networking, sensing, and data fusing in system monitoring, 2) pattern recognition for fault management, 3) decision support methodologies for fault management, 4) Self-healing, 5) management of devices, including dynamic pricing, 5) management and control of distributed systems, and 6) intelligent autonomy. In this talk we describe our recent efforts to address these research issues.
February 8, 2012
Natural and man-made emergencies pose an ever-present threat to the society. In response to the growing number of recent emergencies, and in particular, the Red River crest that causes flood here in Fargo, North Dakota, we propose a community-based scalable cloud computing infrastructure for large-scale emergency management. This infrastructure will maximally utilize all of the available information and human power from within and outside of a community to effectively deploy personnel and logistics to aid in search and rescue. The infrastructure also will aid in damage assessment, enumeration, and coordination to support sustainable livelihood by protecting lives, properties and the environment.
November 9, 2011
With implementation of the National Health Information Network (NHIN) the United States is poised to create a massive cyber-infrastructure resource, which presents both opportunity and peril. Opportunity with respect to the desired outcomes of reducing health-care costs and increasing quality of care. Peril with respect to privacy threats and the potential for creating a new cyber-threat vector with unknown implications. The IDfusion project is a multi-state platform for evaluating an integrated hardware, software and network infrastructure solution to address the security implications inherent in this national strategy. In developing this solution the project has addressed issues ranging from security protocol design to embedded system development and network design. All within the political context of the high profile socio-economic debate over unified health-care. The presentation will feature Dr. Greg Wettstein, R.Ph.,Ph.D. who has authored formal responses to the President's Commission on Science and Technology (PCAST) in the field of health information systems. The presentation will include a brief review of the NHIN strategy and how IDfusion was developed and implemented to support this national initiative.
October 12, 2011
Vertical technologies represent and process data differently from the ubiquitous horizontal data technologies. In vertical technologies, the data is structured column-wise and the columns are processed horizontally (typically across a few to a few hundred columns), while in horizontal technologies, data is structured row-wise and those rows are processed vertically (often down millions, even billions of rows). The patented P-tree technology is a vertical data technology. P-trees are lossless, compressed and data-mining ready data structures. P-trees are data-mining ready because the fast, horizontal data mining processes involved can be done without the need to decompress the structures first. P-tree vertical data structures have been exploited in various domains and data mining algorithms, ranging from classification, association rule mining, to outlier analysis, as well as other data mining algorithms. P-tree technology is patented in the United States by NDSU.Typically, a new data mining technology will either tout improved speed or improved accuracy. P-trees can facilitate both. In fact, the Closed Nearest Neighbor Classification P-tree technology, has been shown to do both simultaneously. Speed improvements are very important in data mining because many quite accurate algorithms require an unacceptable amount of processing time to complete, even with today's powerful computing systems and efficient software platforms. Undoubtedly the most important breakthrough offered by P-tree technology is the ability to process all instances (even billions) of an entity with one horizontal pass across a small number (a few to a few hundred) vertical, compressed P-tree data structures.Treeminer Inc. has licensed the P-tree patents while Dr. William Perrizo's DataSURG group is further developing the technology, including better algorithms for P-tree processing and processing on multi-core CPUs, GP-GPUs and FPGAsJonathan Tolstedt, Patent Agent and Licensing Associate for the NDSU Technology Transfer Office, gave a short presentation on intellectual property policies at NDSU, how those policies apply to Computer Science discoveries, and how best to protect software-based intellectual property (patent versus copyright). Jonathan has a Bachelors in Electrical Engineering and a Masters in Computer Science.
September 14, 2011
The ACM Distinguished lecture by Dr. Vetter will describe the development of interactive short message service applications, which range from simple data access applications to a novel discovery game designed for the freshman experience. Several of these applications are now being sold commercially via a novel technology transfer agreement with the University of North Carolina Wilmington. In addition, several iPhone applications also have been developed. A discussion of the relative advantages, costs and lessons learned while developing mobile phone applications will be presented.Dr. Vetter earned his bachelor's and master's degrees in computer science from NDSU and his doctorate in computer science from the University of Minnesota. He has published more than 100 journal, conference and technical papers. He has served as the principal investigator or co-principal investigator on grants and contracts exceeding $5 million dollars.