Pipl’s People Search API allows you to switch on a sources layer that provides source attribution as well as additional data about the person you are searching. In this tutorial, you will learn to understand and activate the sources layer, what data can be found in a source, the different types of sources returned, and how to filter responses by source type.

What’s in a source?

Each source is all the information on the person gathered from a single data source. See the structure of a source, and the different category classifications of sources.

Each source indicates the attribution of the data. Investigate the source layer to know where a particular piece of data originated from in the Person response. Please note that a piece of information can be attributed to more than one source.

The source layer also can contain additional pieces of information, called tags, that were gathered from the data source, but were not included in the main body of the response. These may include professional skills and courses, vehicle and home ownership records, death records and more.

Types of Sources

Sources may be from public, online or offline records. A URL attributed as an online source may not be active anymore, but at some time Pipl collected and stored data from the URL indicated.

There are two different types of sources that can be returned:

  1. A matching source is a source record that was used to build the person profile. A matching source is easily identified as the @match score is 1 and the source item contains a @person_id.
  2. A related source is a source record that was found in the search as related but did not have a strong enough connection to the profile to be used in creating the Person profile. The @match score will be less than 1. A possible persons response will only have related sources, and no matching sources.
    Each source has a @source_id which is a unique identifier of a record from that source.

example of a matching source and a related source

"sources": [
        {
            "@id": "876377428ce2782dc21dc77fbbf4ef4d",
            "@category": "contact_details",
            "@name": "WhitePages.plus",
            "@origin_url": "https://whitepages.plus/n/Homer_Simpson/San_francisco_CA/67679336311d4323b0c887be2fc5d7fe",
            "@domain": "whitepages.plus",
            "@person_id": "218faf7b-4c89-4b79-aea9-14df6db44c8e",
            "@match": 1.0,
            "names": [
                {
                    "@valid_since": "2013-10-02",
                    "first": "Homer",
                    "last": "Simpson",
                    "display": "Homer Simpson"
                }
            ],
            "phones": [
                {
                    "@valid_since": "2013-10-02",
                    "@type": "mobile",
                    "country_code": 1,
                    "number": 4152549431,
                    "display": "415-254-9431",
                    "display_international": "+1 415-254-9431"
                }
            ],
            "addresses": [
                {
                    "@valid_since": "2013-10-02",
                    "country": "US",
                    "state": "CA",
                    "city": "San Francisco",
                    "street": "B Union Street",
                    "house": "731",
                    "display": "731 B Union Street, San Francisco, California"
                }
            ]
        },
        {
            "@id": "f952158bec8a967bfb3b1bf9e0ebefa2",
            "@category": "contact_details",
            "@name": "USA Consumers",
            "@premium": true,
            "@match": 0.83,
            "names": [
                {
                    "@valid_since": "2008-01-01",
                    "first": "Homee",
                    "last": "Simpson",
                    "display": "Homee Simpson"
                }
            ],
            "gender": {
                "@valid_since": "2008-01-01",
                "content": "male"
            },
            "addresses": [
                {
                    "@valid_since": "2008-01-01",
                    "country": "US",
                    "state": "GA",
                    "city": "Smyrna",
                    "street": "Parkview Pass SE",
                    "house": "2880",
                    "zip_code": "30080",
                    "display": "2880 Parkview Pass Se, Smyrna, Georgia"
                }
            ],
          	"tags":[
          			{
          					"@valid_since": "2008-01-01",
          					"@classification": "AddressType",
          					"content": "Residential"        
        }

How do I activate the sources layer?

The source layer is activated by using the show_sources configuration parameter.

When switching on the sources layer (off by default), each response will also include a Sources array, and the number of items in this array will be indicated by the visible_sources counter.

The show_sources parameter has three valid values:

  • all - The sources layer will show all the sources that were found and which sources were used in clustering to produce the response. Here available_sources and visible_sources are equal, and both matching and related sources will be returned.
  • matching or true - Returns only sources that were used to create the person profile. Sources that were used to generate the person profile will indicate a @person_id. This is not available in a possible persons responses. In this case, visible_sources will be less or equal to the available_sources when a person response is returned. In the case a possible persons response is returned, the visible_sources count will be 0.
  • false - The sources layer is not shown, and no sources will be returned. This is the default setting.

Matching Sources to Profiles in a Response

When you receive a possible person response where there are multiple profiles in a single response, you may want to know which sources were used for each of the profiles in the response. To do this, set the show_sourceids parameter to True. Doing so will add an array called @source_ids to each person and possible person in the response which shows a list of source IDs used to build that profile.

example source_ids array for a possible person

example @source_ids array for a possible person

The Source IDs correspond to the @id field for each source in the source layer. Note that for possible person responses, you will need to use show_sources='all' in order to view the source layer.

corresponding source IDs in the source layer

corresponding source IDs in the source layer

Source IDs as Profile Identifiers

The source IDs are static from response to response, so they can be used as identifiers for a profile, even if the profile changes over time. If you see the same source ID in two responses, this indicates that a very high probability that the two responses are for the same person. This can be used to identify duplicate contacts or in order to remove a specific identity from your records.

How do I know how many sources were returned?

Each API response returns two counters related to the sources:

  • available_sources - The number of sources that were used to generate the search result. This is the number of public online and offline sources that were found and used to create the person profile
  • visible_sources - The number of items returned in the sources array based on the value of the show_sources parameter.

Can I specify the types of sources to be returned?

Three configuration parameters that influence the types of sources returned are:

  • source_category_requirements - This parameter takes one of the 9 source categories as value. It means that information in the response must include data from sources classified in the specified category(ies). This is one of the match criteria parameters that can be used. See the match criteria section in the reference guide for more information.
  • minimum_match - In addition to filtering out possible persons with a match score below this threshold, this parameter will also filter out any sources with a match score below this value.
    • hide_sponsored - The data from sponsored sources return like any other data source and are flagged in the response ("@sponsored": true). This flag indicates that additional information may be available at this source URL behind a paywall. These sources do not affect the pricing of your Pipl Search API call. The default setting of this parameter is ‘False’, and it is recommended to leave this off because additional source data can lead to better fill rate and match rates. Setting the parameter to ‘True’ means that sponsored sources are hidden in the sources layer, and are also not used to generate the response. One reason to hide sponsored data sources may be that you do not want to show these sources as they may be your competitors.