Pandas - Gapminder Data 분석(TSV File) 1

1. Sample Data

2. Import Module

import pandas as pd
from print_df import print_df
import matplotlib.pyplot as plt

3. Data 분석

- TSV(tab separated values) : Data들이 tap 으로 구분된 파일.

- TSV File Load(sep = 구분형식)

df = pd.read_csv('data\gapminder.tsv', sep='\t')

- Data의 행(row) / 열(column) 갯수 확인

print('shape:', df.shape)

shape: (1704, 6)

Process finished with exit code 0

- Data 행의 머리말 부분 확인

print_df(df.head())

+---+-------------+-----------+------+--------------------+----------+-------------------+

+---+-------------+-----------+------+--------------------+----------+-------------------+

| 0 | Afghanistan | Asia | 1952 | 28.801 | 8425333 | 779.4453145 |

| 1 | Afghanistan | Asia | 1957 | 30.331999999999997 | 9240934 | 820.8530296 |

| 2 | Afghanistan | Asia | 1962 | 31.997 | 10267083 | 853.1007099999999 |

| 3 | Afghanistan | Asia | 1967 | 34.02 | 11537966 | 836.1971382 |

| 4 | Afghanistan | Asia | 1972 | 36.088 | 13079460 | 739.9811057999999 |

+---+-------------+-----------+------+--------------------+----------+-------------------+

Process finished with exit code 0

- Data 행의 꼬리말 부분 확인

print_df(df.tail())

+------+----------+-----------+------+--------------------+----------+--------------------+

+------+----------+-----------+------+--------------------+----------+--------------------+

| 1699 | Zimbabwe | Africa | 1987 | 62.351000000000006 | 9216418 | 706.1573059 |

| 1700 | Zimbabwe | Africa | 1992 | 60.376999999999995 | 10704340 | 693.4207856 |

| 1701 | Zimbabwe | Africa | 1997 | 46.809 | 11404948 | 792.4499602999999 |

| 1702 | Zimbabwe | Africa | 2002 | 39.989000000000004 | 11926563 | 672.0386227000001 |

| 1703 | Zimbabwe | Africa | 2007 | 43.486999999999995 | 12311143 | 469.70929810000007 |

+------+----------+-----------+------+--------------------+----------+--------------------+

Process finished with exit code 0

- Data Frame의 열 이름 출력

print(df.columns)

Index(['country', 'continent', 'year', 'lifeExp', 'pop', 'gdpPercap'], dtype='object')

Process finished with exit code 0

- Data Frame의 열 이름 한 줄 씩 출력

for col in df.columns:
    print(col)

country

continent

year

lifeExp

pop

gdpPercap

Process finished with exit code 0

- Data Frame 열의 있는 데이터 타입 확인(object : 문자열 / int : 정수 / float : 실수)

print(df.dtypes)

country object

continent object

year int64

lifeExp float64

pop int64

gdpPercap float64

dtype: object

Process finished with exit code 0

- Data Frame 정보 출력(행 / 열 / 데이터 타입 등)

df.info()

RangeIndex: 1704 entries, 0 to 1703

Data columns (total 6 columns):

country 1704 non-null object

continent 1704 non-null object

year 1704 non-null int64

lifeExp 1704 non-null float64

pop 1704 non-null int64

gdpPercap 1704 non-null float64

dtypes: float64(2), int64(2), object(2)

memory usage: 80.0+ KB

Process finished with exit code 0

- Data Frame의 열 단위 데이터 추출

countries = df['continent']
print(countries)

0 Asia

1 Asia

2 Asia

3 Asia

4 Asia

5 Asia

6 Asia

7 Asia

8 Asia

9 Asia

10 Asia

11 Asia

12 Europe

13 Europe

14 Europe

15 Europe

16 Europe

17 Europe

18 Europe

19 Europe

20 Europe

21 Europe

22 Europe

23 Europe

24 Africa

25 Africa

26 Africa

27 Africa

28 Africa

29 Africa

...

1674 Asia

1675 Asia

1676 Asia

1677 Asia

1678 Asia

1679 Asia

1680 Africa

1681 Africa

1682 Africa

1683 Africa

1684 Africa

1685 Africa

1686 Africa

1687 Africa

1688 Africa

1689 Africa

1690 Africa

1691 Africa

1692 Africa

1693 Africa

1694 Africa

1695 Africa

1696 Africa

1697 Africa

1698 Africa

1699 Africa

1700 Africa

1701 Africa

1702 Africa

1703 Africa

Name: continent, Length: 1704, dtype: object

Process finished with exit code 0

- Type 확인(Series : 1차원 배열)

print(type(countries))

Process finished with exit code 0

- 변수 Counteries의 머리말 부분 확인

print(countries.head())

0 Asia

1 Asia

2 Asia

3 Asia

4 Asia

Name: continent, dtype: object

Process finished with exit code 0

- 변수 Counteries의 꼬릿말 부분 확인

print(countries.tail())

1699 Africa

1700 Africa

1701 Africa

1702 Africa

1703 Africa

Name: continent, dtype: object

Process finished with exit code 0

- 여러 열의 타입 확인(DataFrame : 2차원 배열)

subset_df = df[['country', 'year', 'lifeExp']]
print(type(subset_df))

Process finished with exit code 0

- 변수 subset_df의 머릿말 부분 확인

print_df(subset_df.head())

+---+-------------+------+--------------------+

+---+-------------+------+--------------------+

| 0 | Afghanistan | 1952 | 28.801 |

| 1 | Afghanistan | 1957 | 30.331999999999997 |

| 2 | Afghanistan | 1962 | 31.997 |

| 3 | Afghanistan | 1967 | 34.02 |

| 4 | Afghanistan | 1972 | 36.088 |

+---+-------------+------+--------------------+

Process finished with exit code 0

- 변수 subset_df의 꼬릿말 부분 확인

print_df(subset_df.tail())

+------+----------+------+--------------------+

+------+----------+------+--------------------+

| 1699 | Zimbabwe | 1987 | 62.351000000000006 |

| 1700 | Zimbabwe | 1992 | 60.376999999999995 |

| 1701 | Zimbabwe | 1997 | 46.809 |

| 1702 | Zimbabwe | 2002 | 39.989000000000004 |

| 1703 | Zimbabwe | 2007 | 43.486999999999995 |

+------+----------+------+--------------------+

Process finished with exit code 0

- 행 단위 데이터 추출 방법 1(df.loc['행 번호'])

print(df.loc[0])

country Afghanistan

continent Asia

year 1952

lifeExp 28.801

pop 8425333

gdpPercap 779.445

Name: 0, dtype: object

Process finished with exit code 0

- 행 단위의 데이터 추출 방법 2(df.iloc['행의 인덱스'])

country Afghanistan

continent Asia

year 1952

lifeExp 28.801

pop 8425333

gdpPercap 779.445

Name: 0, dtype: object

Process finished with exit code 0

저작자표시 비영리 변경금지 (새창열림)

'Python_Intermediate > Pandas' 카테고리의 다른 글

Pandas - Gapminder Data 분석(그래프 분석) 3 (0)	2019.05.20
Pandas - Gapminder Data 분석(TSV File) 2 (0)	2019.05.19
Pandas - 연비 TEST Data 분석 2 (0)	2019.05.18
Pandas - 연비 TEST Data 분석 1 (0)	2019.05.18
190517 21:42> Naver 실시간 검색어 20위 (0)	2019.05.17

오늘 코딩 내일 디버깅

Pandas - Gapminder Data 분석(TSV File) 1

'Python_Intermediate > Pandas' 카테고리의 다른 글

티스토리툴바

Pandas - Gapminder Data 분석(TSV File) 1

'Python_Intermediate > Pandas' 카테고리의 다른 글

'Python_Intermediate/Pandas' Related Articles

티스토리툴바