TY - GEN
T1 - Mining the best observational window to model social phenomena
AU - Yan, Chao
AU - Yin, Zhijun
AU - Xiang, Stanley
AU - Chen, You
AU - Vorobeychik, Yevgeniy
AU - Fabbri, Daniel
AU - Kho, Abel
AU - Liebovitz, David
AU - Malin, Bradley
N1 - Funding Information:
VII. ACKNOWLEDGEMENT This research was supported, in part, by grant R01LM010207 and R00LM011933 from the National Institutes of Health, grant 1526014, 1536871 and IIS-1649972 from the National Science Foundation and grant W911NF1610069 and W911NF1810208 from the Army Research Office. The content in this work is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.
PY - 2018/11/15
Y1 - 2018/11/15
N2 - The structure and behavior of organizations can be learned by mining the event logs of the information systems they manage. This supports numerous applications, such as inferring the structure of social relations, uncovering implicit workflows, and detecting illicit behavior. However, to date, no clear guidelines regarding how to select an appropriate time period to perform organizational modeling have been articulated. This is a significant concern because an inaccurately defined period can lead to incorrect models and poor performance in data-driven applications. In this paper, we introduce a data-driven approach to infer the optimal time period for organizational modeling. Our approach 1) represents the system as a social network, 2) decomposes it into its respective principal components, and 3) optimizes the signal-to-noise ratio over varying temporal observation windows. In doing so, we minimize the variance in the organizational structure while maximizing its patterns. We assess the capability of this approach using an anomaly detection scenario, which is based on the patterns learned from the interactions documented in audit logs. The classification performance of two known algorithms is investigated over a range of time periods in two representative datasets. First, we use the electronic health record access logs from Northwestern Memorial Hospital to demonstrate that our framework detects a period that coincides with the optimal performance of the anomaly detection algorithms. Second, we assess the generalizability of the framework through an analysis with a less clearly defined organization, in the form of the social network inferred from the DBLP co-authorship dataset. The results with this data further illustrate that our framework can discover the optimal time period in the context of a more loosely organized group.
AB - The structure and behavior of organizations can be learned by mining the event logs of the information systems they manage. This supports numerous applications, such as inferring the structure of social relations, uncovering implicit workflows, and detecting illicit behavior. However, to date, no clear guidelines regarding how to select an appropriate time period to perform organizational modeling have been articulated. This is a significant concern because an inaccurately defined period can lead to incorrect models and poor performance in data-driven applications. In this paper, we introduce a data-driven approach to infer the optimal time period for organizational modeling. Our approach 1) represents the system as a social network, 2) decomposes it into its respective principal components, and 3) optimizes the signal-to-noise ratio over varying temporal observation windows. In doing so, we minimize the variance in the organizational structure while maximizing its patterns. We assess the capability of this approach using an anomaly detection scenario, which is based on the patterns learned from the interactions documented in audit logs. The classification performance of two known algorithms is investigated over a range of time periods in two representative datasets. First, we use the electronic health record access logs from Northwestern Memorial Hospital to demonstrate that our framework detects a period that coincides with the optimal performance of the anomaly detection algorithms. Second, we assess the generalizability of the framework through an analysis with a less clearly defined organization, in the form of the social network inferred from the DBLP co-authorship dataset. The results with this data further illustrate that our framework can discover the optimal time period in the context of a more loosely organized group.
KW - Anomaly detection
KW - Data Mining
KW - Organizational modeling
KW - Temporal optimization
UR - http://www.scopus.com/inward/record.url?scp=85059775063&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85059775063&partnerID=8YFLogxK
U2 - 10.1109/CIC.2018.00-41
DO - 10.1109/CIC.2018.00-41
M3 - Conference contribution
AN - SCOPUS:85059775063
T3 - Proceedings - 4th IEEE International Conference on Collaboration and Internet Computing, CIC 2018
SP - 46
EP - 55
BT - Proceedings - 4th IEEE International Conference on Collaboration and Internet Computing, CIC 2018
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 4th IEEE International Conference on Collaboration and Internet Computing, CIC 2018
Y2 - 18 October 2018 through 20 October 2018
ER -