Uncovering Our ‘Hidden Visits’ With Cell Phone Data and Machine Learning
If you've ever wondered how researchers track our movements across a country without relying solely on phone calls, a fascinating study by researchers from China and the United States offers some insight. Their collaborative work delves into the use of machine learning to uncover the 'hidden visits' we make—those trips that don't show up in standard telecom data because we're not using our phones enough.
The study, titled **Identifying Hidden Visits From Sparse Call Detail Record Data**, is spearheaded by Zhan Zhao from the University of Hong Kong, alongside Haris N. Koutsopoulos from Northeastern University in Boston, and Jinhua Zhao from MIT. Their goal? To leverage the mobile connectivity records—such as mobile data, SMS, and voice calls—from highly active users to model and predict the movement patterns of those who use their phones less frequently.
*A rough schematic for extracting trip information from Call Detail Record (CD) data.* Source: https://arxiv.org/pdf/2106.12885.pdf
While the team acknowledges the potential privacy concerns their work raises, they emphasize that their aim is to gain a more generalized understanding of movement patterns, rather than zooming in on individual journeys. They also point out that Call Detail Record (CDR) data, which is the backbone of such studies, has its limitations. It's often low in spatial resolution and susceptible to 'positioning noise' due to the user's changing position relative to cell phone towers. However, they argue that this inaccuracy actually serves as a privacy safeguard:
**‘The target application of our study is trip detection and OD estimation\[\*\], which are done at aggregate level, not individual level. The developed models can be directly deployed on the database servers of telecom carriers, without need for data transfer. Furthermore, compared to other forms of big data, such as social media or credit card transaction data, CDR data is relatively less intrusive in terms of personal privacy. In addition, its localization error helps to mask the exact user locations, providing another layer of privacy preservation.'**
Elapsed Time Intervals (ETIs)
When we're on the move with our mobile phones, not necessarily smartphones, the limitations of CDR data as a tool for pinpointing our location become clear. Elapsed Time Intervals (ETIs), those periods during a journey where we don't make or receive calls, are crucial markers for tracking our movements. These intervals of 'silence' can make us temporarily vanish from the grid.
The researchers highlight how these gaps interfere with analytical systems trying to make sense of A>B journeys. The sparsity of data might be hiding an 'unobserved trip'. Their new method tackles this by analyzing the spatiotemporal context of ETIs and considering 'the individual characteristics of the user'.
Dataset
To build their core training set, the researchers used data from a major cellular service operator in a Chinese city with a population of 6 million. This dataset included over two billion mobile phone transactions from three million users in November 2013, focusing solely on voice calls and data access records. Notably, they did not include SMS data, which added to the challenge of dealing with sparse data.
The data included an encrypted unique ID, a Location Area Code (LAC), a timestamp, a cell phone ID linked to the LAC to identify the specific cell phone tower involved in the transaction, and an Event ID indicating whether it was an outgoing/incoming call or data usage.
*Process tree for the identification of hidden visits.*
This information was cross-referenced with a cell tower operation database, enabling the researchers to pinpoint the longitude and latitude coordinates of the tower associated with each communication event. They identified 9000 cell towers within the dataset.
The researchers noted the difficulty in accurately guessing trip destinations based solely on call records, as these records peak in the morning and afternoon, which aligns with typical travel patterns. Since phone calls can precede a journey and may even trigger it, this can skew destination estimation.
*Mobile usage patterns over the course of a day.*
Similar challenges arise with user-initiated data usage, like messaging apps. However, it's the 'automated' data usage—like the systematic polling of APIs for new messages or other data, including GPS and telemetry across apps—that helps in identifying these hidden movements.
Processing
The researchers employed a variety of machine learning classifiers to tackle this problem, including logistic regression, support vector machines (SVM), random forests, and a gradient boosting ensemble approach. These were implemented in Python using scikit-learn with default settings.
Among these, logistic regression provided the most interpretable model parameters. The team also found that longer ETIs increased the likelihood of a hidden visit occurring, with a higher incidence in the morning. Conversely, when a user's CDR data clearly showed a high number of destinations or waypoints, the likelihood of a hidden visit was lower. This finding supports the core principle of their research—that the most active users provide a detailed picture of their movements, from which the behavior of less active users can be inferred.
In their conclusion, the researchers suggest that their approach could be applied to other types of transit data, such as smart card data and geo-located social media information.
The research was supported by funding from Energy Foundation China and the China Sustainable Transportation Center.
*\* Origin-Destination*
Related article
Meta Faces Lawsuit Over AI Glasses Privacy as Staff Reportedly Viewed Explicit Content
Meta is confronting a new lawsuit regarding privacy issues with its AI smart glasses. According to an investigation by Swedish newspapers, workers at a Kenya-based subcontractor have been reviewing customer footage. This footage reportedly included s
Optimization-Driven AI Emerges as New Path to General-Purpose Models
Researchers from the University of Illinois Urbana-Champaign and the University of Virginia have created a new model architecture that could pave the way for more resilient AI systems with enhanced reasoning power.Named the energy-based transformer (
AI Boom Echoes Dot-Com Era Bubble Concerns
The influx of multi-billion dollar investments into AI has fueled a heated debate: is the industry headed for a dot-com style bubble?Investors are vigilant for any cooling of enthusiasm or signs that massive spending on chips and infrastructure isn't
Related Special Topic Recommendations
Comments (20)
0/500
Die Studie zeigt echt spannend, wie sich Bewegungsmuster aus Mobilfunkdaten extrahieren lassen. Gleichzeitig wirft das aber auch Datenschutzfragen auf – wer kontrolliert eigentlich, wie diese Infos genutzt werden? 🧐
Wait, so they're using ML to track our 'hidden visits' now? 😅 Always feels a bit creepy when tech peeks into those unregistered trips... but the data insights could be huge for urban planning or disease tracking, right? Still, makes me side-eye my phone a little more today 🧐
This study on tracking movements with phone data is wild! 😲 It’s like our phones are secretly spilling where we’ve been. Kinda creepy, but super cool how machine learning digs into those 'hidden visits.' Makes me wonder what else they can find out!
This article blew my mind! Using phone data and ML to track hidden visits is so cool, but kinda creepy too. 🤯 Wonder how they balance privacy with all this tech wizardry.
If you've ever wondered how researchers track our movements across a country without relying solely on phone calls, a fascinating study by researchers from China and the United States offers some insight. Their collaborative work delves into the use of machine learning to uncover the 'hidden visits' we make—those trips that don't show up in standard telecom data because we're not using our phones enough.
The study, titled **Identifying Hidden Visits From Sparse Call Detail Record Data**, is spearheaded by Zhan Zhao from the University of Hong Kong, alongside Haris N. Koutsopoulos from Northeastern University in Boston, and Jinhua Zhao from MIT. Their goal? To leverage the mobile connectivity records—such as mobile data, SMS, and voice calls—from highly active users to model and predict the movement patterns of those who use their phones less frequently.
*A rough schematic for extracting trip information from Call Detail Record (CD) data.* Source: https://arxiv.org/pdf/2106.12885.pdf
While the team acknowledges the potential privacy concerns their work raises, they emphasize that their aim is to gain a more generalized understanding of movement patterns, rather than zooming in on individual journeys. They also point out that Call Detail Record (CDR) data, which is the backbone of such studies, has its limitations. It's often low in spatial resolution and susceptible to 'positioning noise' due to the user's changing position relative to cell phone towers. However, they argue that this inaccuracy actually serves as a privacy safeguard:
**‘The target application of our study is trip detection and OD estimation\[\*\], which are done at aggregate level, not individual level. The developed models can be directly deployed on the database servers of telecom carriers, without need for data transfer. Furthermore, compared to other forms of big data, such as social media or credit card transaction data, CDR data is relatively less intrusive in terms of personal privacy. In addition, its localization error helps to mask the exact user locations, providing another layer of privacy preservation.'**
Elapsed Time Intervals (ETIs)
When we're on the move with our mobile phones, not necessarily smartphones, the limitations of CDR data as a tool for pinpointing our location become clear. Elapsed Time Intervals (ETIs), those periods during a journey where we don't make or receive calls, are crucial markers for tracking our movements. These intervals of 'silence' can make us temporarily vanish from the grid.
The researchers highlight how these gaps interfere with analytical systems trying to make sense of A>B journeys. The sparsity of data might be hiding an 'unobserved trip'. Their new method tackles this by analyzing the spatiotemporal context of ETIs and considering 'the individual characteristics of the user'.
Dataset
To build their core training set, the researchers used data from a major cellular service operator in a Chinese city with a population of 6 million. This dataset included over two billion mobile phone transactions from three million users in November 2013, focusing solely on voice calls and data access records. Notably, they did not include SMS data, which added to the challenge of dealing with sparse data.
The data included an encrypted unique ID, a Location Area Code (LAC), a timestamp, a cell phone ID linked to the LAC to identify the specific cell phone tower involved in the transaction, and an Event ID indicating whether it was an outgoing/incoming call or data usage.
*Process tree for the identification of hidden visits.*
This information was cross-referenced with a cell tower operation database, enabling the researchers to pinpoint the longitude and latitude coordinates of the tower associated with each communication event. They identified 9000 cell towers within the dataset.
The researchers noted the difficulty in accurately guessing trip destinations based solely on call records, as these records peak in the morning and afternoon, which aligns with typical travel patterns. Since phone calls can precede a journey and may even trigger it, this can skew destination estimation.
*Mobile usage patterns over the course of a day.*
Similar challenges arise with user-initiated data usage, like messaging apps. However, it's the 'automated' data usage—like the systematic polling of APIs for new messages or other data, including GPS and telemetry across apps—that helps in identifying these hidden movements.
Processing
The researchers employed a variety of machine learning classifiers to tackle this problem, including logistic regression, support vector machines (SVM), random forests, and a gradient boosting ensemble approach. These were implemented in Python using scikit-learn with default settings.
Among these, logistic regression provided the most interpretable model parameters. The team also found that longer ETIs increased the likelihood of a hidden visit occurring, with a higher incidence in the morning. Conversely, when a user's CDR data clearly showed a high number of destinations or waypoints, the likelihood of a hidden visit was lower. This finding supports the core principle of their research—that the most active users provide a detailed picture of their movements, from which the behavior of less active users can be inferred.
In their conclusion, the researchers suggest that their approach could be applied to other types of transit data, such as smart card data and geo-located social media information.
The research was supported by funding from Energy Foundation China and the China Sustainable Transportation Center.
*\* Origin-Destination*
Meta Faces Lawsuit Over AI Glasses Privacy as Staff Reportedly Viewed Explicit Content
Meta is confronting a new lawsuit regarding privacy issues with its AI smart glasses. According to an investigation by Swedish newspapers, workers at a Kenya-based subcontractor have been reviewing customer footage. This footage reportedly included s
Optimization-Driven AI Emerges as New Path to General-Purpose Models
Researchers from the University of Illinois Urbana-Champaign and the University of Virginia have created a new model architecture that could pave the way for more resilient AI systems with enhanced reasoning power.Named the energy-based transformer (
AI Boom Echoes Dot-Com Era Bubble Concerns
The influx of multi-billion dollar investments into AI has fueled a heated debate: is the industry headed for a dot-com style bubble?Investors are vigilant for any cooling of enthusiasm or signs that massive spending on chips and infrastructure isn't
Die Studie zeigt echt spannend, wie sich Bewegungsmuster aus Mobilfunkdaten extrahieren lassen. Gleichzeitig wirft das aber auch Datenschutzfragen auf – wer kontrolliert eigentlich, wie diese Infos genutzt werden? 🧐
Wait, so they're using ML to track our 'hidden visits' now? 😅 Always feels a bit creepy when tech peeks into those unregistered trips... but the data insights could be huge for urban planning or disease tracking, right? Still, makes me side-eye my phone a little more today 🧐
This study on tracking movements with phone data is wild! 😲 It’s like our phones are secretly spilling where we’ve been. Kinda creepy, but super cool how machine learning digs into those 'hidden visits.' Makes me wonder what else they can find out!
This article blew my mind! Using phone data and ML to track hidden visits is so cool, but kinda creepy too. 🤯 Wonder how they balance privacy with all this tech wizardry.





Home






