1. Import Module
import pandas as pd
from print_df import print_df
import matplotlib.pyplot as plt
2. Data 분석
- DataFrame 여러 행을 추출 1(df.loc[인덱스 번호])
print_df(df.loc[[0, 1, 2]])
+---+-------------+-----------+------+--------------------+----------+-------------------+
| | country | continent | year | lifeExp | pop | gdpPercap |
+---+-------------+-----------+------+--------------------+----------+-------------------+
| 0 | Afghanistan | Asia | 1952 | 28.801 | 8425333 | 779.4453145 |
| 1 | Afghanistan | Asia | 1957 | 30.331999999999997 | 9240934 | 820.8530296 |
| 2 | Afghanistan | Asia | 1962 | 31.997 | 10267083 | 853.1007099999999 |
+---+-------------+-----------+------+--------------------+----------+-------------------+
Process finished with exit code 0
- DataFrame 여러 행을 추출 2(df.loc[인덱스 배열])
print_df(df.loc[840:851])
+-----+-------------+-----------+------+--------------------+----------+--------------------+
| | country | continent | year | lifeExp | pop | gdpPercap |
+-----+-------------+-----------+------+--------------------+----------+--------------------+
| 840 | Korea, Rep. | Asia | 1952 | 47.453 | 20947571 | 1030.592226 |
| 841 | Korea, Rep. | Asia | 1957 | 52.681000000000004 | 22611552 | 1487.593537 |
| 842 | Korea, Rep. | Asia | 1962 | 55.292 | 26420307 | 1536.3443869999999 |
| 843 | Korea, Rep. | Asia | 1967 | 57.716 | 30131000 | 2029.2281420000002 |
| 844 | Korea, Rep. | Asia | 1972 | 62.611999999999995 | 33505000 | 3030.87665 |
| 845 | Korea, Rep. | Asia | 1977 | 64.766 | 36436000 | 4657.22102 |
| 846 | Korea, Rep. | Asia | 1982 | 67.123 | 39326000 | 5622.942464 |
| 847 | Korea, Rep. | Asia | 1987 | 69.81 | 41622000 | 8533.088805 |
| 848 | Korea, Rep. | Asia | 1992 | 72.244 | 43805450 | 12104.27872 |
| 849 | Korea, Rep. | Asia | 1997 | 74.64699999999999 | 46173816 | 15993.52796 |
| 850 | Korea, Rep. | Asia | 2002 | 77.045 | 47969150 | 19233.98818 |
| 851 | Korea, Rep. | Asia | 2007 | 78.623 | 49044790 | 23348.139730000003 |
+-----+-------------+-----------+------+--------------------+----------+--------------------+
Process finished with exit code 0
- 특정 데이터 추출(df.loc[행 이름, 열이름])
print(df.loc[851, 'pop'])
49044790
Process finished with exit code 0
- 행 번호 840 ~ 851까지의 pop 열 의 Data만 추출
korea_pop = df.loc[840:851, 'pop']
print(korea_pop)
840 20947571
841 22611552
842 26420307
843 30131000
844 33505000
845 36436000
846 39326000
847 41622000
848 43805450
849 46173816
850 47969150
851 49044790
Name: pop, dtype: int64
Process finished with exit code 0
- 행 번호 840 ~ 851 까지의 'country', 'year', 'pop' 열들의 데이터 추출
korea_year_pop = df.loc[840:851, ['country', 'year', 'pop']]
print_df(korea_year_pop)
+-----+-------------+------+----------+
| | country | year | pop |
+-----+-------------+------+----------+
| 840 | Korea, Rep. | 1952 | 20947571 |
| 841 | Korea, Rep. | 1957 | 22611552 |
| 842 | Korea, Rep. | 1962 | 26420307 |
| 843 | Korea, Rep. | 1967 | 30131000 |
| 844 | Korea, Rep. | 1972 | 33505000 |
| 845 | Korea, Rep. | 1977 | 36436000 |
| 846 | Korea, Rep. | 1982 | 39326000 |
| 847 | Korea, Rep. | 1987 | 41622000 |
| 848 | Korea, Rep. | 1992 | 43805450 |
| 849 | Korea, Rep. | 1997 | 46173816 |
| 850 | Korea, Rep. | 2002 | 47969150 |
| 851 | Korea, Rep. | 2007 | 49044790 |
+-----+-------------+------+----------+
Process finished with exit code 0
- 연도별로 그룹화 및 그룹 타입 확인
grouped_year = df.groupby('year')
print(type(grouped_year))
print(grouped_year)
<class 'pandas.core.groupby.generic.DataFrameGroupBy'>
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x00000185FFC0B5F8>
Process finished with exit code 0
- 연도 그룹 평균값을 계산할 컬럼 추출 / 타입 확인
grouped_year = df.groupby('year')
grouped_year_lifeExp = grouped_year['lifeExp']
print(type(grouped_year_lifeExp))
print(grouped_year_lifeExp)
<class 'pandas.core.groupby.generic.SeriesGroupBy'>
<pandas.core.groupby.generic.SeriesGroupBy object at 0x00000210C9BCF6D8>
Process finished with exit code 0
- 추출된 컬럼의 평균 값 계산
grouped_year = df.groupby('year')
grouped_year_lifeExp = grouped_year['lifeExp']
mean = grouped_year_lifeExp.mean()
print(mean)
year
1952 49.057620
1957 51.507401
1962 53.609249
1967 55.678290
1972 57.647386
1977 59.570157
1982 61.533197
1987 63.212613
1992 64.160338
1997 65.014676
2002 65.694923
2007 67.007423
Name: lifeExp, dtype: float64
Process finished with exit code 0
- 년도별 기대 수명의 평균 값 추출(위에 3단계 한줄로)
print(df.groupby('year')['lifeExp'].mean())
year
1952 49.057620
1957 51.507401
1962 53.609249
1967 55.678290
1972 57.647386
1977 59.570157
1982 61.533197
1987 63.212613
1992 64.160338
1997 65.014676
2002 65.694923
2007 67.007423
Name: lifeExp, dtype: float64
Process finished with exit code 0
- 전체 인구의 기대 수명 평균값
print(df['lifeExp'].mean())
59.474439366197174
Process finished with exit code 0
- 연도별 인구수의 평균
print(df.groupby('year')['pop'].mean())
year
1952 1.695040e+07
1957 1.876341e+07
1962 2.042101e+07
1967 2.265830e+07
1972 2.518998e+07
1977 2.767638e+07
1982 3.020730e+07
1987 3.303857e+07
1992 3.599092e+07
1997 3.883947e+07
2002 4.145759e+07
2007 4.402122e+07
Name: pop, dtype: float64
Process finished with exit code 0
- 연도별 > 국가별 그룹 기준 기대 수명 평균값
print(df.groupby(['year', 'continent'])['lifeExp'].mean())
year continent
1952 Africa 39.135500
Americas 53.279840
Asia 46.314394
Europe 64.408500
Oceania 69.255000
1957 Africa 41.266346
Americas 55.960280
Asia 49.318544
Europe 66.703067
Oceania 70.295000
1962 Africa 43.319442
Americas 58.398760
Asia 51.563223
Europe 68.539233
Oceania 71.085000
1967 Africa 45.334538
Americas 60.410920
Asia 54.663640
Europe 69.737600
Oceania 71.310000
1972 Africa 47.450942
Americas 62.394920
Asia 57.319269
Europe 70.775033
Oceania 71.910000
1977 Africa 49.580423
Americas 64.391560
Asia 59.610556
Europe 71.937767
Oceania 72.855000
1982 Africa 51.592865
Americas 66.228840
Asia 62.617939
Europe 72.806400
Oceania 74.290000
1987 Africa 53.344788
Americas 68.090720
Asia 64.851182
Europe 73.642167
Oceania 75.320000
1992 Africa 53.629577
Americas 69.568360
Asia 66.537212
Europe 74.440100
Oceania 76.945000
1997 Africa 53.598269
Americas 71.150480
Asia 68.020515
Europe 75.505167
Oceania 78.190000
2002 Africa 53.325231
Americas 72.422040
Asia 69.233879
Europe 76.700600
Oceania 79.740000
2007 Africa 54.806038
Americas 73.608120
Asia 70.728485
Europe 77.648600
Oceania 80.719500
Name: lifeExp, dtype: float64
Process finished with exit code 0
- 국가별 > 연도별 그룹 기준 기대 수명 평균값
print(df.groupby(['continent', 'year'])['lifeExp'].mean())
continent year
Africa 1952 39.135500
1957 41.266346
1962 43.319442
1967 45.334538
1972 47.450942
1977 49.580423
1982 51.592865
1987 53.344788
1992 53.629577
1997 53.598269
2002 53.325231
2007 54.806038
Americas 1952 53.279840
1957 55.960280
1962 58.398760
1967 60.410920
1972 62.394920
1977 64.391560
1982 66.228840
1987 68.090720
1992 69.568360
1997 71.150480
2002 72.422040
2007 73.608120
Asia 1952 46.314394
1957 49.318544
1962 51.563223
1967 54.663640
1972 57.319269
1977 59.610556
1982 62.617939
1987 64.851182
1992 66.537212
1997 68.020515
2002 69.233879
2007 70.728485
Europe 1952 64.408500
1957 66.703067
1962 68.539233
1967 69.737600
1972 70.775033
1977 71.937767
1982 72.806400
1987 73.642167
1992 74.440100
1997 75.505167
2002 76.700600
2007 77.648600
Oceania 1952 69.255000
1957 70.295000
1962 71.085000
1967 71.310000
1972 71.910000
1977 72.855000
1982 74.290000
1987 75.320000
1992 76.945000
1997 78.190000
2002 79.740000
2007 80.719500
Name: lifeExp, dtype: float64
Process finished with exit code 0
- gapminder 데이터 프레임에서 나라이름이 'Korea, Rep'만 추출
print_df(df[df['country'] == 'Korea, Rep.'])
+-----+-------------+-----------+------+--------------------+----------+--------------------+
| | country | continent | year | lifeExp | pop | gdpPercap |
+-----+-------------+-----------+------+--------------------+----------+--------------------+
| 840 | Korea, Rep. | Asia | 1952 | 47.453 | 20947571 | 1030.592226 |
| 841 | Korea, Rep. | Asia | 1957 | 52.681000000000004 | 22611552 | 1487.593537 |
| 842 | Korea, Rep. | Asia | 1962 | 55.292 | 26420307 | 1536.3443869999999 |
| 843 | Korea, Rep. | Asia | 1967 | 57.716 | 30131000 | 2029.2281420000002 |
| 844 | Korea, Rep. | Asia | 1972 | 62.611999999999995 | 33505000 | 3030.87665 |
| 845 | Korea, Rep. | Asia | 1977 | 64.766 | 36436000 | 4657.22102 |
| 846 | Korea, Rep. | Asia | 1982 | 67.123 | 39326000 | 5622.942464 |
| 847 | Korea, Rep. | Asia | 1987 | 69.81 | 41622000 | 8533.088805 |
| 848 | Korea, Rep. | Asia | 1992 | 72.244 | 43805450 | 12104.27872 |
| 849 | Korea, Rep. | Asia | 1997 | 74.64699999999999 | 46173816 | 15993.52796 |
| 850 | Korea, Rep. | Asia | 2002 | 77.045 | 47969150 | 19233.98818 |
| 851 | Korea, Rep. | Asia | 2007 | 78.623 | 49044790 | 23348.139730000003 |
+-----+-------------+-----------+------+--------------------+----------+--------------------+
Process finished with exit code 0
'Python_Intermediate > Pandas' 카테고리의 다른 글
Pandas - Scientists Data 분석 (0) | 2019.05.20 |
---|---|
Pandas - Gapminder Data 분석(그래프 분석) 3 (0) | 2019.05.20 |
Pandas - Gapminder Data 분석(TSV File) 1 (0) | 2019.05.19 |
Pandas - 연비 TEST Data 분석 2 (0) | 2019.05.18 |
Pandas - 연비 TEST Data 분석 1 (0) | 2019.05.18 |