본문 바로가기

Python_Intermediate/Pandas

Pandas - Gapminder Data 분석(TSV File) 1

반응형

1. Sample Data

gapminder.tsv


2. Import Module

import pandas as pd
from print_df import print_df
import matplotlib.pyplot as plt


3. Data 분석

- TSV(tab separated values) : Data들이 tap 으로 구분된 파일.


- TSV File Load(sep = 구분형식)

df = pd.read_csv('data\gapminder.tsv', sep='\t')


- Data의 행(row) / 열(column) 갯수 확인

print('shape:', df.shape)

shape: (1704, 6)


Process finished with exit code 0


- Data 행의 머리말 부분 확인

print_df(df.head())

+---+-------------+-----------+------+--------------------+----------+-------------------+

|   |   country   | continent | year |      lifeExp       |   pop    |     gdpPercap     |

+---+-------------+-----------+------+--------------------+----------+-------------------+

| 0 | Afghanistan |    Asia   | 1952 |       28.801       | 8425333  |    779.4453145    |

| 1 | Afghanistan |    Asia   | 1957 | 30.331999999999997 | 9240934  |    820.8530296    |

| 2 | Afghanistan |    Asia   | 1962 |       31.997       | 10267083 | 853.1007099999999 |

| 3 | Afghanistan |    Asia   | 1967 |       34.02        | 11537966 |    836.1971382    |

| 4 | Afghanistan |    Asia   | 1972 |       36.088       | 13079460 | 739.9811057999999 |

+---+-------------+-----------+------+--------------------+----------+-------------------+



Process finished with exit code 0


- Data 행의 꼬리말 부분 확인

print_df(df.tail())

+------+----------+-----------+------+--------------------+----------+--------------------+

|      | country  | continent | year |      lifeExp       |   pop    |     gdpPercap      |

+------+----------+-----------+------+--------------------+----------+--------------------+

| 1699 | Zimbabwe |   Africa  | 1987 | 62.351000000000006 | 9216418  |    706.1573059     |

| 1700 | Zimbabwe |   Africa  | 1992 | 60.376999999999995 | 10704340 |    693.4207856     |

| 1701 | Zimbabwe |   Africa  | 1997 |       46.809       | 11404948 | 792.4499602999999  |

| 1702 | Zimbabwe |   Africa  | 2002 | 39.989000000000004 | 11926563 | 672.0386227000001  |

| 1703 | Zimbabwe |   Africa  | 2007 | 43.486999999999995 | 12311143 | 469.70929810000007 |

+------+----------+-----------+------+--------------------+----------+--------------------+



Process finished with exit code 0


- Data Frame의 열 이름 출력

print(df.columns)

Index(['country', 'continent', 'year', 'lifeExp', 'pop', 'gdpPercap'], dtype='object')


Process finished with exit code 0


- Data Frame의 열 이름 한 줄 씩 출력

for col in df.columns:
print(col)

country

continent

year

lifeExp

pop

gdpPercap


Process finished with exit code 0


- Data Frame 열의 있는 데이터 타입 확인(object : 문자열 / int : 정수 / float : 실수)

print(df.dtypes)

<class 'pandas.core.series.Series'>

country       object

continent     object

year           int64

lifeExp      float64

pop            int64

gdpPercap    float64

dtype: object


Process finished with exit code 0


- Data Frame 정보 출력(행 / 열 / 데이터 타입 등)

df.info()

<class 'pandas.core.frame.DataFrame'>

RangeIndex: 1704 entries, 0 to 1703

Data columns (total 6 columns):

country      1704 non-null object

continent    1704 non-null object

year         1704 non-null int64

lifeExp      1704 non-null float64

pop          1704 non-null int64

gdpPercap    1704 non-null float64

dtypes: float64(2), int64(2), object(2)

memory usage: 80.0+ KB


Process finished with exit code 0


- Data Frame의 열 단위 데이터 추출

countries = df['continent']
print(countries)

0         Asia

1         Asia

2         Asia

3         Asia

4         Asia

5         Asia

6         Asia

7         Asia

8         Asia

9         Asia

10        Asia

11        Asia

12      Europe

13      Europe

14      Europe

15      Europe

16      Europe

17      Europe

18      Europe

19      Europe

20      Europe

21      Europe

22      Europe

23      Europe

24      Africa

25      Africa

26      Africa

27      Africa

28      Africa

29      Africa

         ...  

1674      Asia

1675      Asia

1676      Asia

1677      Asia

1678      Asia

1679      Asia

1680    Africa

1681    Africa

1682    Africa

1683    Africa

1684    Africa

1685    Africa

1686    Africa

1687    Africa

1688    Africa

1689    Africa

1690    Africa

1691    Africa

1692    Africa

1693    Africa

1694    Africa

1695    Africa

1696    Africa

1697    Africa

1698    Africa

1699    Africa

1700    Africa

1701    Africa

1702    Africa

1703    Africa

Name: continent, Length: 1704, dtype: object


Process finished with exit code 0


- Type 확인(Series : 1차원 배열)
print(type(countries))

<class 'pandas.core.series.Series'>


Process finished with exit code 0


- 변수 Counteries의 머리말 부분 확인

print(countries.head())

0    Asia

1    Asia

2    Asia

3    Asia

4    Asia

Name: continent, dtype: object


Process finished with exit code 0


- 변수 Counteries의 꼬릿말 부분 확인
print(countries.tail())

1699    Africa

1700    Africa

1701    Africa

1702    Africa

1703    Africa

Name: continent, dtype: object


Process finished with exit code 0


- 여러 열의 타입 확인(DataFrame : 2차원 배열)
subset_df = df[['country', 'year', 'lifeExp']]
print(type(subset_df))
<class 'pandas.core.frame.DataFrame'>

Process finished with exit code 0

- 변수 subset_df의 머릿말 부분 확인
print_df(subset_df.head())
+---+-------------+------+--------------------+
|   |   country   | year |      lifeExp       |
+---+-------------+------+--------------------+
| 0 | Afghanistan | 1952 |       28.801       |
| 1 | Afghanistan | 1957 | 30.331999999999997 |
| 2 | Afghanistan | 1962 |       31.997       |
| 3 | Afghanistan | 1967 |       34.02        |
| 4 | Afghanistan | 1972 |       36.088       |
+---+-------------+------+--------------------+



Process finished with exit code 0

- 변수 subset_df의 꼬릿말 부분 확인
print_df(subset_df.tail())

+------+----------+------+--------------------+

|      | country  | year |      lifeExp       |

+------+----------+------+--------------------+

| 1699 | Zimbabwe | 1987 | 62.351000000000006 |

| 1700 | Zimbabwe | 1992 | 60.376999999999995 |

| 1701 | Zimbabwe | 1997 |       46.809       |

| 1702 | Zimbabwe | 2002 | 39.989000000000004 |

| 1703 | Zimbabwe | 2007 | 43.486999999999995 |

+------+----------+------+--------------------+




Process finished with exit code 0


- 행 단위 데이터 추출 방법 1(df.loc['행 번호'])

print(df.loc[0])

country      Afghanistan

continent           Asia

year                1952

lifeExp           28.801

pop              8425333

gdpPercap        779.445

Name: 0, dtype: object


Process finished with exit code 0


- 행 단위의 데이터 추출 방법 2(df.iloc['행의 인덱스'])

country      Afghanistan

continent           Asia

year                1952

lifeExp           28.801

pop              8425333

gdpPercap        779.445

Name: 0, dtype: object


Process finished with exit code 0

반응형