Data-Driven Decisioning


Pipl Trust

Pipl has a machine-learning based fraud and identity verification solution called Pipl Trust which uses the data in our Search API product as well as other connectivity and trust signals to generate trust scores for an action or data point. If you are looking into machine learning solutions for identity trust scores or fraud prevention, consider whether PiplTrust will meet your needs.

Pipl Search API in Data-driven Decisioning

The Pipl Search API response includes a wide array of person-centric information that can be leveraged for your data-driven decisions in methods such as Machine Learning (ML) models and Rules Engines. This document summarizes some best practices to consider when using Pipl data.

Confirming Identity with Pipl

The Pipl response contains a vast amount of information about the identity of a person that can be used in a rules engine or an ML model to help confirm an identity and/or gain additional knowledge about them. This data includes: data fields, metadata, sources, and tags. Each of these categories of data can be useful and should be considered in building a rules engine or an ML model.

Best Practices When Comparing Data Fields

When confirming and comparing your existing data with a Pipl API response, consider the following for specific data fields:

  • Emails: Normalize emails before comparison. Personal email services such as Gmail can be written as both [email protected] and [email protected]. These guidelines in the Pipl API reference for md5 calculations will be useful as you begin to compare data fields.
  • Names: Due to spelling variations or people from the same family using the same account, consider breaking up a name into first, middle and last components and comparing them separately. These individual parts can then be confirmed using methods such as fuzzy matching, comparing common nicknames, sounds like, and transliteration if in foreign languages.
  • Addresses: Compare the individual address components (street, city, state, zip) by using methods such as fuzzy matching and spelling variations. Use fuzzy logic to accurately compare street name variations, such as “731B Union Street” vs “731 B Union St”
  • Phones: Phone numbers vary in local and international formats in different countries. Compare phone numbers in international format to get the best results.

Gaining Additional Knowledge About an Identity

The Pipl response returns a complete online/offline historical footprint of an identity which contains additional layers of information that may be useful. The following data points are worth considering as you design a model to make a determination about an identity:

  • Counts: The quantity of data returned may correlate to confirming a real identity

    • Data field counts.
      The Pipl API response contains a section called “available data” indicating a summary for the number of data fields associated with the person (e.g. the number of emails, phones, addresses, etc). A real person typically has multiple emails, phones, or addresses over the course of their lifetime.
    • Number of sources.
      The identity of a person is created from various public source records. A real person typically appears in many public sources.
  • Time Stamps: A history of public source records may correlate to confirming a real identity

    • Pipl returns timestamps as to when the data was first seen and when we last came across the data. A real person typically appears in public records over a long period of time proportional to their age.
    • It is possible to extract the age of data, for example, how long Pipl has known about an email or phone number and generally what is the “oldest” piece of information related to an identity as real identities tend to have rich data history.
  • Data Types: Knowing more about the types of data may correlate to confirming a real identity

    • Pipl returns metadata that describes the type of specific data fields. For example, an email is indicated as personal, work, whether the email is of a free service provider such as Gmail or Hotmail or a one-time disposable email service. Similarly, phones are marked as mobile, home_phone, or work_phone.
  • Boolean indicators: Knowing more about the existence may correlate to confirming a real identity

    • Pipl returns social media data from networks such as Facebook, Twitter, Linkedin, and others. The existence of social media profiles over time may be an indicator of a real identity. For example, a person who has several social media accounts that have been in existence for several years is more likely to be a real person versus one who just created a social media account last month. You might find that the mere existence of a Job in a profile is a valid signal.

Handling multiple results

In some cases, especially when using Pipl to query for a phone number or an address, Pipl Search returns multiple persons relating to that data point. There is no need to decide which person to compare, as in identity verification situations there is a high likelihood that people who share an address, phone or email will use the same account for a transaction, so any comparison of any data point to any of the persons relating to that piece of information will usually prove valuable.

Example Use Cases

Pipl data combined with or compared to your data can provide an opportunity to improve your rules engine or ML models to achieve better outcomes. The combination of these two data sets can help answer questions to validate the identity of a new customer on your platform or a transaction in which the purchaser ships to a new recipient at a new address. For these use cases, the following are questions you may want to consider when using Pipl data in your data-driven decisions:

Validate the identity of a new customer on your platform

Using Pipl you may be able to determine if a new customer is who they say they are.

  • Does the Pipl API return a Person response when queried by the personal information they provided?
  • How long has Pipl known about the data associated with the person?
  • Does the name in the Pipl response match the name on record?
  • Is the billing address, shipping address, or phone number present in the Pipl response?
  • Is there a partial match? What is the distance between the physical locations in the response?

Validate if a new shipping address or a new person shipped to is associated with the account holder

Using Pipl you may be able to determine if the new person and/or the new address is related in any way to the account holder. Query the Pipl API for the account holder and the new person being shipped to. Based on the Pipl Search response:

  • Is there a relationship between the persons?
  • Are there shared addresses or phone numbers?