Deriving New Columns Using List Comprehension and Functions

The goal of this project is to make 2 new columns (Region and Area) using list comprehension. Let's say you have a country column in your data frame and one of the countries is "Australia". Here, you want to create 2 columns where region for Australia should be "Australasia" and area for Australia should be "Asia". The process is as follows:

  1. We will read a file with a list of countries using pandas.
  2. We will make Regions lists and Area lists.
  3. We will define 2 functions with if/else conditions to check for values in the region and area lists.
  4. We will use the above functions to get our desired output.

Reading the file which contains the countries list

In [1]:
import pandas as pd

df1 = pd.read_csv('Countries.csv')

df1.head(10)
Out[1]:
Countries
0 USA
1 France
2 USA
3 China
4 China
5 USA
6 Netherlands
7 China
8 Spain
9 Brazil

Creating Region Lists

In [2]:
Central_and_Southern_Africa = ['Angola', 'Botswana', 'Burkina Faso', 'Burundi', 
                               'Cameroon', 'Cape Verde', 'Central African', 'Congo', 
                               'Djibouti', 'Guinea', 'Eritrea', 'Ethiopia', 
                               'Gabon', 'Gambia', 'Ghana', 'Ivory', 
                               'Kenya', 'Lesotho', 'Madagascar', 'Malawi', 
                               'Mauritius', 'Mozambique', 'Namibia', 
                               'Nigeria', 'Rwanda', 'Somalia', 'South Africa', 
                               'Swaziland', 'Tanzania', 'Uganda', 'Zambia', 'Zimbabwe']

Northern_Africa = ['Algeria', 'Chad', 'Egypt', 'Liberia', 
                   'Libya', 'Mali', 'Mauritania', 'Morocco', 
                   'Tome', 'Sudan', 'Tunisia']

West_Africa = ['Benin', 'Niger', 'Senegal', 'Sierra', 'Togo']

Central_America = ['Bahamas', 'Belize', 'Costa Rica', 'Dominica', 
                   'Dominican', 'Salvador', 'Guatemala', 'Haiti', 
                   'Honduras', 'Mexico', 'Nicaragua', 'Panama', 
                   'Puerto Rico', 'Kitts', 'Turks', 
                   'US Virgin Islands']

North_America = ['USA', 'Antigua', 'Aruba', 'Barbados', 
                 'Bermuda', 'British Virgin Islands', 'Canada', 
                 'Cayman', 'Cuba', 'Grenada', 'Jamaica', 
                 'Mexico', 'Puerto Rico', 'Saint Lucia', 
                 'Trinidad']

South_America = ['Argentina', 'Bolivia', 'Brazil', 'Chile', 
                 'Colombia', 'Ecudaor', 'Guyana', 'Paraguay', 
                 'Peru', 'Suriname', 'Uruguay', 'Venezuela']

Australasia = ['Australia', 'Fiji', 'Guam', 'Marshall', 
               'Micronesia, Federated', 'New Zealand', 'Papua New Guinea', 
               'Samoa', 'Solomon', 'Tonga']

Central_Asia = ['Kazakhstan', 'Kyrgyzstan', 'Tajikistan', 'Turkmenistan', 
                'Uzbekistan']

Japan = ['Japan']

North_Asia = ['China', 'Hong Kong', 'Macau', 'Mongolia', 
              'Korea', 'Taiwan']

South_Asia = ['Afghanistan', 'Bangladesh', 'Bhutan', 'India', 
              'Maldives', 'Nepal', 'Pakistan', 'Seychelles', 
              'Sri Lanka']

South_East_Asia = ['Brunei', 'Cambodia', 'Indonesia', 'Laos', 
                   'Malaysia', 'Philippines', 'Singapore', 
                   'Thailand', 'Timor', 'Vietnam']

Central_and_Eastern_Europe = ['Armenia', 'Azerbaijan', 'Belarus', 'Bosnia', 
                              'Georgia', 'Yugoslav', 'Moldova', 'Montenegro', 
                              'Poland', 'Russia', 'Serbia', 'Ukraine']

Northern_Europe = ['Faroe', 'Greenland', 'Iceland', 'Norway']

Southern_Europe = ['Albania', 'Gibraltar', 'Marino', 'Turkey']

Western_Europe = ['Andorra', 'Channel Islands', 'Isle of Man', 
                  'Liechtenstein', 'Monaco', 'Switzerland']

European_Union = ['Austria', 'Belgium', 'Bulgaria', 'Croatia', 
                  'Cyprus', 'Czech', 'Denmark', 'Estonia', 
                  'Finland', 'France', 'Germany', 'Greece', 
                  'Hungary', 'Ireland', 'Italy', 'Latvia', 
                  'Lituania', 'Luxembourg', 'Malta', 'Netherlands', 
                  'Poland', 'Portugal', 'Romania', 'Slovakia', 
                  'Slovenia', 'Spain', 'Sweden', 'United Kingdom']

GCC = ['Bahrain', 'Kuwait', 'Oman', 'Qatar', 
       'Saudi Arabia', 'United Arab Emirates']

Middle_East = ['Afghanistan', 'Bahrain', 'Iran', 'Iraq', 
               'Israel', 'Jordan', 'Kuwait', 'Lebanon', 
               'Oman', 'Palestine', 'Qatar', 'Saudi Arabia', 
               'Syrian', 'United Arab Emirates', 'Yemen']

Creating Area Lists

In [3]:
Africa = ['Central and _Southern Africa', 'Northern Africa', 'West Africa']

Americas = ['Central America', 'North America', 'South America']

Asia = ['Australasia', 'Central Asia', 'Japan', 'North Asia',
       'South Asia', 'South East Asia']

Europe = ['Central and Eastern Europe', 'Northern Europe', 'Southern Europe',
         'Western Europe', 'European Union']

Middle_East = [GCC, 'Middle East']

Creating function for Region column

In [4]:
def region(row):
    if row['Countries'] in Central_and_Southern_Africa:
        return 'Central and Southern Africa'
    elif row['Countries'] in Northern_Africa:
        return 'Northern Africa'
    elif row['Countries'] in West_Africa:
        return 'West Africa'
    elif row['Countries'] in Central_America:
        return 'Central America'
    elif row['Countries'] in North_America:
        return 'North America'
    elif row['Countries'] in South_America:
        return 'South America'
    elif row['Countries'] in Australasia:
        return 'Australasia'
    elif row['Countries'] in Central_Asia:
        return 'Central Asia'
    elif row['Countries'] in Japan:
        return 'Japan'
    elif row['Countries'] in North_Asia:
        return 'North Asia'
    elif row['Countries'] in South_Asia:
        return 'South Asia'
    elif row['Countries'] in South_East_Asia:
        return 'South East Asia'
    elif row['Countries'] in Central_and_Eastern_Europe:
        return 'Central and Eastern Europe'
    elif row['Countries'] in Northern_Europe:
        return 'Northern Europe'
    elif row['Countries'] in Southern_Europe:
        return 'Southern Europe'
    elif row['Countries'] in Western_Europe:
        return 'Western Europe'
    elif row['Countries'] in European_Union:
        return 'European Union'
    elif row['Countries'] in GCC:
        return 'GCC'
    elif row['Countries'] in Middle_East:
        return 'Middle East'

Creating function for Area column

In [5]:
def area(row):
    if row['Region'] in Africa:
        return 'Africa'
    if row['Region'] in Americas:
        return 'Americas'
    if row['Region'] in Asia:
        return 'Asia'
    if row['Region'] in Europe:
        return 'Europe'
    if row['Region'] in Middle_East:
        return 'Middle East'

Applying region function to get Region column in our data frame

In [6]:
df1 = df1.assign(Region=df1.apply(region, axis=1))

df1.head(10)
Out[6]:
Countries Region
0 USA North America
1 France European Union
2 USA North America
3 China North Asia
4 China North Asia
5 USA North America
6 Netherlands European Union
7 China North Asia
8 Spain European Union
9 Brazil South America

Applying area function to get Area column in our data frame

In [7]:
df1 = df1.assign(Area=df1.apply(area, axis=1))

df1.head(10)
Out[7]:
Countries Region Area
0 USA North America Americas
1 France European Union Europe
2 USA North America Americas
3 China North Asia Asia
4 China North Asia Asia
5 USA North America Americas
6 Netherlands European Union Europe
7 China North Asia Asia
8 Spain European Union Europe
9 Brazil South America Americas

Author: Amandeep Saluja